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Introduction 

This  was  an  Academic  Award  (Career  Development  Award).  The  purpose  of  this  application  was 
to  free  additional  time  for  the  Principal  Investigator  to  ‘‘..appraise  critically  the  state  of  the 
science  in  a  particular  aspect  of  breast  cancer  research  and  to  forge  new  avenues  of 
investigation.  ”  The  PI  continues  to  apply  new,  state-of-the-art  technologies  to  identify  key 
endocrine-regulated  molecular  pathways  to  apoptosis/proliferation.  By  identifying  key 
components  of  these  pathways,  we  may  be  able  to  predict  response  to  first-line  and  crossover 
antiestrogenic  therapies,  and/or  provide  novel  therapeutic  strategies  for  antiestrogen  resistant 
tumors. 

Body 

This  is  an  Academic  Award,  for  which  a  detailed  research  plan  was  not  required.  Since  the  award 
is  to  support  academic  development,  the  aims  are  not  finite,  i.e.,  restricted  only  to  the  time  frame 
or  resources  provided  through  this  type  of  award.  Furthermore,  unlike  a  R01 -style  application, 
the  amount  of  work  proposed  represents  the  efforts  of  a  number  of  individuals  and  funded  grants 
already  active  within  the  Pi’s  laboratory,  and  both  ongoing  and  future  collaborations  with  other 
laboratories.  Consistent  with  this,  the  proposed  work  requires  substantially  more  than  the  time 
and  financial  resources  provided  by  a  single  R01.  Without  describing  the  work  in  this  manner,  it 
was  unclear  how  we  could  address  the  requirements  of  this  new  award  category.  The  aims, 
amount  of  work  proposed  (which  must,  e.g.,  go  beyond  the  three  year  limit  to  satisfy  the  award 
requirements)  and  time  frames  were  presented,  in  the  original  application,  with  these  issues  in 
mind.  To  prevent  duplication  and  to  limit  the  size  of  this  report,  published  data  are  provided  in 
the  reprints,  rather  than  being  recapitulated  in  the  text,  and  very  preliminary  data  are  described 
but  not  shown. 

Aim  1:  We  will  expand  the  MCF7/LCC1  and  MCF7/LCC9  databases  to  a  minimum  of  30,000 
tags/database.  We  also  expect  to  establish  a  30,000  tag  database  for  MCF-7  cells  growing 
with  and  without  17P-estradiol.  Completion  of  all  four  databases  will  require  longer  than 
the  three  year  period,  since  we  also  plan  to  perform  functional  studies  on  candidate  genes 
identified  from  our  comparisons  of  the  MCF7/LCC9  and  MCF7/LCC1  databases.  For  the 
purposes  of  this  application’s  duration,  we  would  consider  this  aim  to  have  been 
successfully  completed  once  the  MCF7/LCC1  and  MCF7/LCC9  databases  have  each 
reached  a  size  of  30,000  tags.  Time:  years  1-3. 

1.  We  have  completed  the  initial  study  and  the  manuscript  was  published  last  year  in  Cancer 
Research  -  a  reprint  is  included  in  the  appendix  (Gu  et  al.  Cancer  Res  62:  3428-3437,  2002). 

Aim  2:  We  will  continue  to  investigate  the  functional  relevance  of  those  genes/proteins  that 
receive  sufficient  priority.  This  will  include  transient  transfection  studies  with  promoter-reporter 
constructs  (for  transcription  modulating  factors)  and  stable  transfections  to  assess  functional 
relevance.  We  also  will  investigate  clinical  relevance  by  exploring  expression  in  breast  tumor 
biopsies,  and  correlating  expression  (or  lack  thereof)  with  established  prognostic  variables,  e.g., 
lymph  node  status,  ER  expression,  S-phase/proliferation,  tumor  grade,  disease  free  and  overall 
survival  and  response  to  endocrine  and  cytotoxic  chemotherapies.  For  the  purposes  of  this 
application’s  duration,  we  would  consider  this  aim  to  have  been  successfully  completed  if  we  can 
confirm  the  roles  of  NPM,  NFkB,  CRE/hXBP-1  and  the  IRF-1  polymorphism.  Time:  years  1-3. 
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We  have  continued  our  studies  focusing  on  the  human  X-box  binding  protein- 1  (hXBP-1)  and 
interferon  regulatory  factor- 1  (IRF-1).  We  have  successfully  overexpressed  hXBP-1  in  both 
MCF-7  and  T47D  cells  and  have  selected  both  pooled  populations  and  individual  clones  of 
overexpressing  cells.  We  have  confirmed  overexpression  in  the  MCF-7  cells  and  will  do  so 
shortly  in  the  T47D  cells.  These  models  will  be  used  to  study  further  the  role  of  hXBP-1  in 
endocrine  resistance  and  in  affecting  breast  cancer  cell  biology  in  general. 

We  have  made  substantial  progress  with  the  putative  tumor  suppressor  IRF-1 ,  particularly 
with  our  dominant  negative  (dnIRF-1).  Our  data  now  show  that  we  can  specifically  separate  the 
cell  cycle  arrest  effects  of  ICI  182,780  (Faslodex;  Fulvestrant)  from  the  proapoptotic  effects  in 
antiestrogen  sensitive  cells.  Proapoptotic  effects  of  ICI  182,780  are  fully  ablated  in  the  presence 
of  dnIRF-1,  whereas  the  cell  cycle  effects  are  unaffected  by  dnIRF-1.  We  have  also  determined 
that  the  regulation  of  IRF-1  mRNA  is  endocrine  regulated,  being  inhibited  by  estradiol  and 
induced  by  ICI  182,780  in  sensitive  cells  (MCF-7  and  T47D).  This  endocrine  regulation  is  lost  in 
antiestrogen  resistant  cells.  These  effects  are  fully  consistent  with  the  ability  of  IRF-1  to  induce  a 
caspase  cascade.  Of  notable  importance,  the  effect,  which  appears  mediated  by  estrogen 
receptors,  does  not  affect  responsiveness  to  cytotoxic  drugs  that  also  signal  to  apoptosis  through 
IRF-1,  i.e.,  these  drugs  still  induce  apoptosis  through  IRF-1  mediated  signaling. 

1.  These  data  were  used  to  support  a  R01  application  to  NIH  on  IRF-1  and  endocrine  resistance 
and  an  IDEA  application  to  DOD  on  IRF-1  and  cytotoxic  drug  responsiveness. 

2.  We  have  completed  a  draft  of  the  studies  of  IRF-1  on  endocrine  resistance,  a  manuscript  on 
IRF-1  and  cytotoxic  drug  responsiveness  is  in  preparation. 

We  have  also  now  completed  an 
initial  study  of  the  patterns  of 
expression  of  the  nucleophosmin 
(NPM),  IRF-1,  hXBP-1  and 
NFkB  proteins  in  breast  cancer. 
Using  tissue  microarrays  of  480 
cores  from  54  breast  carcinomas 
and  standard  indirect 
immunoperoxidase  procedures 
for  immunohistochemistry,  we 
first  confirmed  the  known  co¬ 
expression  of  ER  and  PgR  (see  Table).  Since  IRF-1  is  a  transcription  factor  with  tumor 
suppressor  activity,  we  might  expect  activated  protein  to  be  in  the  nucleus  (IRF-ln)  and  inactive 
protein  to  be  in  the  cytosol  (IRF-1  c)  and  an  inverse  expression  between  IRF-ln  and  survival 
factors  such  as  NFkB  and  IRF-1.  Coexpression  of  survival/mitogenic  activities  also  might  be 
expected.  While  some  correlations  are  of  borderline  significance,  we  find  IRF-ln  inversely 
correlated,  and  IRF-lc  positively  correlated,  with  both  NFkB  and  XBP-1.  We  also  find 
coexpression  of  NFkB  and  XBP-1.  These  data  are  broadly  consistent  with  the  hypothetical  gene 
signaling  network  we  have  implicated  in  driving  antiestrogen  resistance. 

3.  These  data  were  used  to  support  an  IDEA  application  to  DOD  to  study  the  expression  patterns 
of  these  proteins  in  cases  from  the  Scottish  Adjuvant  Tamoxifen  Trial. 


Table:  Correlation  of  IRF-1 ,  XBP-1 ,  and  NFkB  expression 
from  tissue  microarrays.  * Numbers  are  p-values.  (-)  =  inverse 
correlation ,  (+)  =  direct  correlation.  IRF-lc  =  cytoplasmic 
staining;  IRF-ln  ~  nuclear  staining;  NS=not  significant. 

ERa  PgR  IRF-lc  IRF-ln  NFkB 

PgR 

IRF-lc 

IRF-ln 

NFkB 

XBP-1 

0.001  (+)  - 

0.079  (+)  NS 

NS  0.014  (+)  0.088  (-) 

NS  NS  0.002  (+)  0.034  (-) 

NS  NS  0.001  (+)  0.082(-)  0.018  (+) 
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4.  Dr.  Yuelin  Zhu  (postdoctoral  fellow)  received  a  prestigious  Merck  Senior  Fellow  Award  from 
the  Endocrine  Society  (June,  2003)  for  his  poster  presenting  this  work 


Aim  3:  We  will  continue  to  integrate  the  emerging  experimental  data  into  our  molecular 
transduction  schemes,  and  amend  these  as  appropriate.  Clearly,  this  will  require 
substantial  ongoing  effort  to  integrate  the  studies  from  the  more  broad-based  projects, 
e.g.,  SAGE  and  gene  array,  with  the  more  focused  and  functional  studies,  e.g.,  those 
specifically  addressing  the  function  of  NPM and  IRF-1 .  Time:  years  1-3. 

We  continue  to  increase  our  collaborations  with  the  informatics  group  at  Catholic  University  of 
America,  which  has  recently  expanded  to  include  researchers  at  Virginia  Tech)  and  develop  new 
algorithms  for  studying  the  transcriptomes  of  endocrine  sensitive  and  resistant  breast  cancers. 

1.  The  method  described  in  the  previous  report  has  been  published  (Wang  et  al,  IEEE  Trans  Inf 
Technol  Biomed,  6:  29-37,  2002)  -  a  reprint  is  included  in  the  appendix. 

2.  The  method  for  collecting  breast  biopsies  for  microarray  studies,  which  was  described  in  the 
previous  report,  has  been  published  (Ellis  et  al.,  Clin  Cancer  Res,  8:  1155-1166, 2002)  -  a  reprint 
is  included  in  the  appendix. 

We  also  have  published  a  new  method  for  analyzing  gene  expression  microarray  data. 

3.  Liu,  et  al.,  Stat  Med,  21 :  3465-3474,  2002  -  a  reprint  is  included  in  the  appendix. 

We  also  continue  our  studies  of  P-glycoprotein,  and  have  recently  published  a  study  of 
progesterone  analogue  inhibitors  of  this  efflux  pump  and  an  extensive  review  of  P-glycoprotein 
mediated  drug  resistance. 

4.  Leonessa,  F.,  Kim,  J.-H.,  Ghiorghis,  A.,  Kulawiec,  R.  Hammer,  C.,  Talebian,  A.  &  Clarke  R. 
“C-7  Analogs  of  progesterone  as  potent  inhibitors  of  the  P-Glycoprotein  efflux  pump.”  J  Med 
Chem,  45:  390-398,  2002. 

5.  Leonessa,  F.  &  Clarke,  R.  "ABC  transporters  and  drug  resistance  in  breast  cancer."  Endocr 
Related  Cancer,  10:  43-73,  2003. 


Each  of  the  above  aims  represent  ongoing  studies  within  the  Pi’s  laboratory  and  each  will 
continue  beyond  the  limitations  of  this  award.  We  will  continue  to  evaluate  new  methodologies 
and  adapt  our  approaches  and  integrative  studies  in  the  light  of  published  work  from  other 
laboratories.  In  this  latter  regard,  the  award  specifically  allowed  the  PI  to  spend  more  time 
critically  appraising  the  state  of  science  in  the  area  of  resistance  to  estrogens  and 
antiestrogens  in  breast  cancer. 
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Key  Research  Accomplishments  (bulleted) 

•  Published  several  manuscripts  and  reviews  on  breast  cancer,  estrogen,  antiestrogens  and 
drug  resistance  (listed  below). 

•  Used  data  from  the  studies  described  in  the  final  report  to  support  two  successful  breast 
cancer  applications  to  NIH  (as  PI:  a  prestigious  Bioengineering  Research  Partnership 
award  to  fund  the  molecular  characterization  of  breast  cancer  -  R01-CA096483  and  as 
co-PI  on  a  program  project  to  study  the  effects  of  the  timing  of  dietary  exposure  on 
mammary  gland  development  and  breast  cancer  susceptibility  -  U54-CA1 00970). 

•  Used  data  from  the  studies  described  in  the  final  report  to  support  a  successful  technology 
development  R21/R33  to  NIH  (R21/R33-EB000830  Computational  decomposition  of 
composite  molecular  signatures. 

•  Used  data  from  the  studies  described  in  the  final  report  and  appended  publications  to 
support  new  applications  to  NIH  and  DOD.  Thus,  An  R21/R33  and  an  R01  have  been 
submitted  to  NIH  and  a  CTR  preproposal  was  successfully  submitted  to  DOD  (full 
proposal  is  due  in  august,  2003). 

•  Completed  a  pilot  study  applying  tissue  microarrays  to  human  breast  cancers  and  shown 
the  coexpression  of  several  key  components  of  our  putative  signaling  network. 

•  Submitted  a  major  review  (by  invitation)  on  antiestrogen  resistance  to  the  journal 
Oncogene. 


Reportable  Outcomes 

Reportable  outcomes  are  presented  as  A.  Manuscripts,  Abstracts  and  Presentations;  B.  Other 
Professional  Activities;  C.  Degrees;  and  D.  Funding  Applied  for  Based  on  Work  Supported  by 
this  Award. 

A.  Manuscripts,  Abstracts  and  Presentations 

Consistent  with  the  goals  of  allowing  the  PI  to  spend  time  reevaluating  his  field,  the  PI  has 
recently  published  a  major  review  entitled  “Cellular  and  Molecular  Pharmacology  of 
Antiestrogen  Action  and  Resistance”  in  the  peer  review  journal  Pharmacological  Reviews.  A 
copy  of  this  review  and  other  published  articles  are  included  in  the  appendix.  "In  press"  articles 
are  not  included  in  the  appendix. 


Manuscripts  (published  since  last  annual  report) 

1.  Johnson,  M.,  Kenney,  N.,  Hilakivi-Clarke,  L.,  Singh,  S.,  Chepko,  G.,  Newbold,  R., 
Clarke,  R.,  Sholler,  P.F.,  Lirio,  A.A.,  Foss,  C.,  Trock,  B.,  Paik,  S.,  Stoica,  A.  &  Martin, 
M.B.  “Cadmium  mimics  the  effects  of  estrogen  in  vivo  in  the  uterus  and  mammary 
gland.”  Nature  Med,  in  press. 
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2.  Gu,  Z.,  Lee,  Y.R.,  Skaar,  T.C.,  Bouker,  K.B.,  Welch,  J.N.,  Lu,  J.,  Liu,  A.,  Davis,  N., 
Wang,  Y.  &  Clarke,  R.  “Association  of  interferon  regulatory  factor- 1,  nucleophosmin, 
nuclear  factor-KB  and  cAMP  response  element  binding  with  acquired  resistance  to 
Faslodex  (ICI  1 82,780).”  Cancer  Res,  8:  1155-1166,  2002. 

3.  Ellis,  M.,  Davis,  N.,  Coop,  A.,  Liu,  M.,  Schumaker,  L.,  Lee,  R.Y.,  Srikanchana,  R., 
Russell,  C.,  Singh,  B.,  Miller,  W.R.,  Stearns,  V.,  Pennanen,  M.,  Tsangaris,  T.,  Gallagher, 
A.,  Liu,  A.,  Zwart,  A.,  Hayes,  D.F.,  Lippman,  M.E.,  Wang,  Y.  &  Clarke,  R. 
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2.  Co-Principal  Investigator  (Principal  Investigator;  Leena  A.  Hilakivi-Clarke,  Ph.D.): 
N.I.H.  U54-CA100970:  "Timing  of  dietary  exposure  and  breast  cancer  risk."  Dr.  Clarke 
runs  a  Biostatistics  and  Microarray  Core,  co-directs  the  Administrative  Core,  and  is  a 
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Grant  to  study  the  effects  of  the  timing  of  estrogenic  and  other  exposures  on  the 
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4.  Principal  Investigator  -  subcontract:  (Principal  Investigator  of  host  grant;  Yue  Wang, 
Ph.D.):  N.I.H.  R21/R33-EB000830:  "Computational  decomposition  of  composite 
molecular  signatures".  This  is  an  R21/R33  award  to  study  algorithms  for  the  in  silico 
partial  volume  correction  of  complex  tissues  (breast  biopsies);  essentially  an  in  silico 
method  for  microdissection  in  gene  expression  microarray  studies. 

5.  Principal  Investigator:  USAMRMC:  IDEA  Award  BC990358  “Molecular 
characterization  of  resistance.”  This  award  funds  gene  microarray  analysis  of  antiestrogen 
responsive  and  resistant  human  breast  cancer  cell  lines. 
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Conclusions 

We  have  made  considerable  progress  in  addressing  our  proposed  aims  during  the  period  of  the 
entire  award.  The  time  made  available  to  the  PI  through  this  Academic  Award  has  resulted  in 
several  relevant  publications  and  reviews,  the  ability  to  attract  significant  additional  funding 
related  to  the  research,  and  the  generation  of  preliminary  data  that  could  lead  to  further 
publications/grants  in  the  coming  years.  The  PI  also  continues  to  participate  in  other  related 
professional  activities.  For  example,  during  this  award  the  PI  was  promoted  to  full  professor  with 
tenure  and  used  the  time  freed  by  this  award  to  complete  and  successfully  submit  his  D.Sc. 
thesis.  He  also  was  appointed  to  three  additional  editorial  boards,  including  Cancer  Research, 
and  was  appointed  as  chair  of  a  NIH  Study  Section  (ZATI  SEP  DB  01  Basic  Science).  These 
professional  appointments  and  continued  level  of  productivity  would  not  have  been  achieved  had 
this  award  not  been  forthcoming. 
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Cadmium  mimics  in  vivo  the  effects  of  estrogen  in  vivo  in 
the  uterus  and  mammary  gland 

Michael  Johnson1,5,  Nicholas  Kenney2,5,  Adriana  Stoica1,5,  Leena  Hilakivi-Clarke1,  Baljit  Singh1,  Gloria  Chepko1, 
Robert  Clarke1,  Peter  F  Sholler1,  Apolonio  A  Lirio1,  Colby  Foss3,  Ronald  Reiter1,  Bruce  Trock1,  Soonmyoung  Paik4 
&  Mary  Beth  Martin1 


It  has  been  suggested  that  environmental  contaminants  that 
mimic  the  effects  of  estrogen  contribute  to  disruption  of  the 
reproductive  systems  of  animals  in  the  wild,  and  to  the  high 
incidence  of  hormone-related  cancers  and  diseases  in  Western 
populations.  Previous  studies  have  shown  that  functionally, 
cadmium  acts  like  steroidal  estrogens  in  breast  cancer  cells  as 
a  result  of  its  ability  to  form  a  high-affinity  complex  with  the 
hormone  binding  domain  of  the  estrogen  receptor1-2.  The 
results  of  the  present  study  show  that  cadmium  also  has  potent 
estrogen-like  activity  in  vivo.  Exposure  to  cadmium  increased 
uterine  wet  weight,  promoted  growth  and  development  of  the 
mammary  glands  and  induced  hormone-regulated  genes  in 
ovariectomized  animals.  In  the  uterus,  the  increase  in  wet 
weight  was  accompanied  by  proliferation  of  the  endometrium 
and  induction  of  progesterone  receptor  (PgR)  and  complement 
component  C3.  In  the  mammary  gland,  cadmium  promoted  an 
increase  in  the  formation  of  side  branches  and  alveolar  buds 
and  the  induction  of  casein,  whey  acidic  protein,  PgR  and  C3. 
In  utero  exposure  to  the  metal  also  mimicked  the  effects  of 
estrogens.  Female  offspring  experienced  an  earlier  onset  of 
puberty  and  an  increase  in  the  epithelial  area  and  the  number 
of  terminal  end  buds  in  the  mammary  gland. 

To  determine  whether  in  vivo  exposure  to  an  environmentally  rele¬ 
vant  dose  of  cadmium  results  in  estrogen  like  activity  in  target 
organs,  female  rats  were  ovariectomized  on  postnatal  day  28  and 
allowed  to  rest  for  3  weeks.  Rats  were  then  given  a  single  intraperi- 
toneal  dose  of  cadmium  (5  |Xg  per  kg  body  weight,  or  ~27  nmol/kg). 
The  antiestrogen  I CM  82,780  was  administered  intraperitoneally  at  a 
dose  of  500  |ig  per  kg  per  d.  As  a  positive  control,  rats  received  a  pellet 
of  estradiol  releasing  60  jug  per  kg  per  d.  The  uterine  response  was 
measured  4  d  after  treatment,  and  the  mammary  gland  response  was 
measured  4  and  14  d  later.  As  expected,  there  was  a  3.8  fold  increase  in 
uterine  wet.  weight  in  the  estradiol -treated  animals  (Table  1).  In  the 
cadmium-  treated  animals,  there  was  a  1 .9-fold  increase  in  uterine  wet 
weight  that  was  blocked  by  the  antiestrogen.  A  similar  increase  in 
uterine  wet  weight  was  also  observed  when  the  animals  were  given  a 
single  intraperitoncal  10  jxg/kgdose  of  cadmium  (1.7-fold  increase) 


or  when  the  animals  were  ovariectomized  on  day  40  (1.8-fold 
increase).  Histological  examination  showed  that  the  cadmium- 
induced  increase  in  uterine  wet  weight,  was  due  to  a  mitogenic 
response  and  not  due  to  toxicity  (Fig.  1).  In  the  ovariectomized  con¬ 
trol  animals,  the  endometrial  lining  was  flat  to  cuboidal.  No  vacuola- 
tion  or  stromal  inflammation  was  observed.  In  addition,  no  mitoses 
were  observed  in  either  the  endometrial  or  stromal  cells.  In  the  estra¬ 
diol-treated  animals,  the  endometrial  lining  showed  epithelial  hyper¬ 
plasia  and  hypertrophy.  The  endometrial  cells  were  taller  and  had 
abundant  granular  cytoplasm.  The  surrounding  stroma  was  hypercel- 
lular  and  was  infiltrated  by  numerous  eosinophils.  Cadmium-treated 
animals  also  showed  hyperplasia  and  hypertrophy  [AU:  Referr  to  fig¬ 
ures  here.].  In  addition,  the  cells  showed  abundant,  subnuclear  and 
supranuclear  vacuolation.  The  stroma  was  more  cellular  in  the  cad¬ 
mium-treated  animals  than  in  the  ovariectomized  animals  and  was 
less  cellular  than  in  the  animals  treated  with  estradiol.  No  stromal 
inflammatory  infiltrate  was  noted  in  the  cadmium -treated  animals. 
Both  estradiol-  and  cadmium-treated  animals  showed  rare  mitoses  in 
the  endometrial  cells.  The  ability  of  the  antiestrogen  to  block  the 
effects  of  cadmium  suggests  that  the  effects  of  the  metal  are  mediated 
by  the  estrogen  receptor.  There  was  no  evidence  of  toxicity  in  the  liver 
or  kidney,  organs  sensitive  to  the  toxic  effects  of  the  metal  (data  not. 
shown),  and  there  was  no  effect,  on  whole  body  weight. 

Ovarian  steroids  also  have  a  central  role  in  the  growth  and  develop¬ 
ment  of  the  mammary  gland  (reviewed  in  refs.  3,4).  In  the  rat,  estro¬ 
gens  stimulate  ductal  and  stromal  proliferation  and  the  secretion  of 
prolactin,  which,  in  turn,  regulates  lobuloalveolar  development.  To 
determine  whether  cadmium  mimics  the  effects  of  estrogens  in  the 
mammary  gland,  epithelial  density  was  measured  on  days  4  and  14 
after  treatment  (Table  1  and  data  not.  shown).  In  the  ovariectomized. 
control  animals,  the  mammary  gland  consisted  of  a  simple  ductal  net¬ 
work  with  low  epithel  ial  density.  After  exposure  to  estradiol,  there  was 
a  50%  increase  in  epithelial  density  on  day  4  and  14,  ihc  result  of  an 
increase  in  mammary  ducts  and  secretory  lobuloalveolar  structures. 
In  animals  exposed  to  cadmium,  there  was  also  a  50%  increase  in 
epithelial  density  by  day  4  and  a  30%  increase  by  day  14.  The  cad¬ 
mium-induced  augmentation  was  due  to  a  rise  in  quaternary  branch¬ 
ing  of  the  ducts  as  well  as  an  increase  in  lobuloalveolar  structures.  In 


department  of  Oncology,  Lombardi  Cancer  Center,  Georgetown  University,  Washington,  DC  20007,  USA.  department  of  Biology,  Hampton  University,  Hampton, 
Virginia  23668,  USA.  department  of  Chemistry,  Georgetown  University,  Washington,  DC  20007,  USA.  University  of  Pittsburgh,  Pittsburgh,  Pennsylvania  15238, 
USA.  5These  authors  contributed  equally  to  this  work.  Correspondence  should  be  addressed  to  M.B.M.  (martinmb@georgetown.edu). 


NATURr  MEDICINE  VOLUME  9  |  NUMBER  8  |  AUGUST  2003 


1 


LETTERS 


Control  Estradiol 


Figure  1  Histological  effects  of  cadmium  in  the  uteri  of  ovariectomized  rats 
treated  with  cadmium,  estradiol  or  iCt-182,780  (ICI).  Sections  were 
stained  with  H&E  (x200). 


rats  treated  with  the  antiestrogen,  the  epithelial  density  was  not  sig¬ 
nificantly  different  from  control  rats  hut  there  were  fewer  secretory 
structures  in  the  glands  of  anticstrogcn-trcated  rats  [AU:  OK?]  on  day 
4,  with  a  more  pronounced  effect  observed  on  day  1 4.  The  antiestro¬ 
gen  also  blocked  the  effects  of  cadmium  on  epithelial  density  and 
secretory  structures,  suggesting  that  the  response  of  the  mammary 
gland  to  cadmium  is  mediated  by  the  estrogen  receptor.  To  assess 
whether  cadmium  induced  a  secretory  differentiation  of  the  gland, 
the  expression  of  casein  and  whey  acidic  protein  were  also  examined 
on  day  14  (data  not  shown).  In  control  rats,  the  mammary  gland  was 
devoid  of  casein,  whereas  animals  treated  with  estradiol  for  14  d  syn¬ 
thesized  significant  amounts  of  the  protein.  Casein  was  found  in  the 
ductal  lumen,  alveolar  cells  and  alveolus.  Rats  exposed  to  a  single  dose 
of  cadmium  also  synthesized  significant  amounts  of  casein.  The  pro¬ 
tein  was  localized  in  lumenal  cells,  alveolar 
cells,  alveolus  and  ductal  lumen,  with  most  of 
the  casein  immunolocalized  in  the  cyto¬ 
plasm.  Expression  of  whey  acidic  protein  was 
also  detected  but  was  not  as  abundant.  The 
ability  of  cadmium  to  induce  the  synthesis  of 
both  casein  and  whey  acidic  protein  indicates 
that  the  metal  induces  milk  protein  synthesis 
in  the  mammary  gland. 

To  determine  whether  cadmium  also  mim¬ 
ics  the  effects  of  estradiol  on  gene  expression, 
we  measured  the  amounts  of  PgR  and  com 
plement  protein  C3  rnRNA  (Fig.  2).  In  the 
uterus,  estradiol  treatment  resulted  in  a  3- 
fold  increase  in  PgR  mRNA  and  a  124-fold 
increase  in  C3  mRNA.  Similarly,  treatment 
with  cadmium  induced  a  2-fold  increase  in 
PgR  mRNA  and  a  12-fold  increase  in  com¬ 
plement  C3  mRNA  ( P  <  0.001).  In  the  mam¬ 
mary  gland,  estradiol  induced  a  42-  and 
416-fold  increase  in  PgR  mRNA  and  C3 
mRNA,  respectively,  and  cadmium  induced  a 
9 -fold  increase  in  PgR  mRNA  and  a  16-fold 


increase  in  C3  mRNA  ( P  <  0.001 ).  The  increase  in  PgR  and  C3  mRNA 
expression  in  both  organs  was  blocked  by  the  antiestrogen,  providing 
additional  evidence  that  the  effects  of  cadmium  are  mediated  by  the 
estrogen  receptor. 

The  amount  of  cadmium  [AU:  Please  refer  to  Fig.  2]  in  the  uterus 
and  mammary  gland  was  also  determined  using  anodic  stripping 
voltammetry,  an  electrochemical  technique  that  offers  detection  lim¬ 
its  in  the  sub-parts-per-billion  range5.  Cadmium  was  not  detected  in 
the  orga  ns  of  control  a  mmals  but  was  detectable  in  most  organs  of  the 
metal-treated  animals.  However,  the  amount  of  cadmium  was  too  low 
to  accurately  quantitate.  When  detectable  in  the  uterus  or  mammary 
gland,  the  amount  of  the  cadmium  was  approximately  10  2  pg  per  g 
tissue  (10  5  parts  per  billion). 

In  utero  exposure  to  estrogens  and  estrogen  like  substances  causes 
early  onset  of  puberty6*7  and  alters  mammary  gland  development,  in 
female  offspring6,8,9.  To  assess  the  estrogenic  effects  of  cadmium  after 
in  utero  exposure,  pregnant  rats  were  given  two  injections  of  cad¬ 
mium  intraperitoneally  at  a  dose  of  either  0.5  or  5  jLig  per  kg  body 
weight  on  days  12  and  17  of  gestation.  Cadmium  did  not.  alter  preg¬ 
nancy  weight  gain,  number  of  pups  per  litter  or  birth  weights  (data 
not  shown).  On  postnatal  day  35,  however,  female  offspring  exposed 
to  the  lower  dose  of  cadmium  had  significantly  increased  body 
weights  (135.8  ±  2.4  g,  mean  ±  s.e.rn.)  compared  with  control  off¬ 
spring  (1 20  ±  2.6  g;  T(2#i5) =  13.8,  P  <  0.001).  This  temporary  increase 
in  weight,  is  consistent  with  in  utero  exposure  to  estrogenic  com¬ 
pounds10.  Also  consistent  with  in  utero  exposure  to  low  doses  of 
estrogens6,  there  was  no  difference  in  uterine  wet  weight  (either  crude 
or  adjusted  for  body  weight)  between  control  animals  and  animals 
exposed  to  cadmium  (data  not. shown).  In  utero  exposure  to  cadmium 
also  induced  an  earlier  onset  of  vaginal  opening6,7.  Vaginal  opening 
occurred  on  average  on  day  30.6  ±  0.6  in  control  animals,  and  on  day 
27.2  ±1.1  (P  <  0.05)  and  day  26.7  ±  1 .1  (P  <  0.05)  in  a nimals  exposed 
to  cadmium  doses  of  0.5  and  5  jig/kg,  respectively. 

The  effects  of  in  utero  exposure  to  cadmium  on  mammary  gland 
development  were  assessed  on  postnatal  day  35  during  the  rapid 
growth  phase  of  the  gland.  As  with  perinatal  exposure  to  estrogenic 
compounds6,  in  utero  exposure  to  [AU:  OK?]  cadmium  increased  the 
parenchymal  area  of  the  mammary  gland  and  the  number  of  terminal 


Table  1  Effects  of  cadmium  on  uterine  wet  weight  and  mammary  gland  density  in 
ovariectomized  animals 


Uterine  weight  (day  4) 

Mammary  gland  density 

Body  weight 

Grams 

Fold  increase 

Day  4 

Day  14 

Grams 

Control 

0.075 

54.3 

75.4 

187 

(±0.0069;  n  -  17) 

(±2.5;  o  =  17) 

(±1.9;  0=9) 

(o  =  9) 

Cadmium 

0.14* 

1.9 

82.8*** 

99.8*** 

189 

(±0.0111;  o  =  21) 

(±4.0;  0=  20) 

(±2.9;  0=  14) 

(o=  12) 

ICi-182,780 

0.048 

0.64 

69.7 

72.7 

182 

(±0.0027;  n  =  13) 

(±3.7;  0=8) 

(±2.2;  0  =  8) 

ii 

i — * 

o 

Cadmium 

0.046 

0.61 

69.0 

72.2 

182 

+  ICI-182,780 

(±0.0035;  o  =  11) 

(±5.4;  0  =  8) 

(±2.8;  0=8) 

(0  =  8) 

Estradiol 

0.284** 

3.8 

84.1*** 

112.6*** 

172 

(±0.0168;  n=22) 

(±3.2;  o=20) 

(±6.9;  0=  11) 

(o=10) 

Uterine  wet  weight  and  mammary  gland  density  in  ovariectomized  rats  treated  with  cadmium,  estradiol  or  ICI- 
182,780.  Uterine  wet  weights  and  epithelial  density  are  shown  as  mean  ±  s.e.rn.  *,  P  =  0.000 1  compared 
with  control;  **,  P<  0.0001  compared  with  control.  Epithelial  density  data  were  analyzed  by  one-way  ANOVA 
(F,1|6S)=  12.41,  P<  0.001  for  day  4;  F(446)  =  20.73,  P<  0.001  for  day  14).  ***,  P<  0.05  (significantly 
different  from  controls). 
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Figure  2  Effects  of  cadmium  on  the  expression  of  PgR  and  C3  in 
ovariectomized  animals  treated  with  cadmium,  estradiol  or  (C  1-182,780. 
Total  RNA  was  quantified  using  a  real-time  PCR  assay.  Data  represent 
mean  ±  s.d.  of  values  from  three  independent  experiments  done  in 
duplicate.  C,  control;  E2,  estradiol;  101,  ICI-182,780;  Cd,  cadmium.  [AU: 
Are  any  of  these  significant?  If  so,  add  *  and  p  values.] 

end  buds  and  decreased  the  number  of  alveolar  buds6.  The  mammary 
epithelial  area  was  significantly  larger,  at  70.7  ±  5.2  [  AU:  UNITS?]  and 
66.5  ±  7.7  [AU:  UNITS?]  (mean  ±  s.e.m.)  in  rats  exposed  to  cadmium 
doses  of  0.5  and  5  p.g/kg,  respectively,  compared  with  45.5  ±  4.2  [AU: 
UNITS?]  in  control  rats  (Fig.  3a).  The  mammary  glands  of  rats 
exposed  to  the  lower  dose  of  cadmium  also  contained  significantly 
more  terminal  end  buds  (12.5  ±  1.0)  than  control  rats  (9.4  +0.2;  Fig. 
3b).  Both  doses  of  cadmium  reduced  the  number  of  alveolar  buds  in 
the  mammary  gland  from  15.0  ±  3.9  in  control  rats  1o  -7.5  ±  1.5  in 
rats  exposed  to  the  metal  (Fig.  3c).  The  ability  of  cadmium  to  mimic 
the  in  utero  effects  of  estrogens  provides  additional  support  that  envi¬ 
ronmentally  relevant  doses  of  the  metal  have  potent  estrogen-like 
activities. 

To  date,  most  animal  studies  that  have  examined  the  toxic  and  car¬ 
cinogenic  effects  of  cadmium  have  used  doses  [AU:  OK  AS  EDITED?] 
in  the  range  of  1-5  rng/kg  (~5-25  jumol/kg) 11,12  To  mimic  human 
exposure,  we  used  a  1,000-fold  lower  [AU:  ‘-FOLD  LOWER’  IS 
AMBIGUOUS— PLEASE  GIVE  A  PERCENTAGE]  dose  of  cadmium, 
similar  to  dietary  exposures  according  to  the  World  Health 
Organization-recommended  Provisional  Tolerable  Weekly  Intake  of 
7  pg  per  kg  body  weight  per  week.  [AU:  Add  referrence.]  In  the 
United  States,  Germany,  the  United  Kingdom  and  Sweden,  dietary 
cadmium  exposure  is  estimated  to  range  from  0. 12  to  0.49  pg  per  kg 
body  weight  per  d,  with  the  highest  exposure  occurring  in  children 
1-6  years  of  age13"19.  Cigarette  smoking  is  also  an  important  source  of 
exposure,  contributing  2-4  pg  of  cadmium  per  pack  per  d  (refs. 
20,21). 

Exposure  to  cadmium  is  also  influenced  by  its  long  half  life,  esti¬ 
mated  to  range  from  10  to  30  years21,  which  may  account  for  its  sig¬ 
nificant  accumulation  in  the  body.  In  newborns,  the  amount  of 
cadmium  is  negligible,  but  by  age  30  the  body  burden  can  reach  30 
mg.  In  non  smokers,  the  concentration  of  cadmium  in  the  kidney  is 
-15-20  pg  per  g  tissue,  whereas  in  smokers,  the  concentration  dou¬ 
bles  to  30  -40  pg  per  g  tissue.  High  concentrations  of  cadmium  are 
also  present  in  breast  fat  of  healthy  women  and  breast  cancer  patients 
(20-30  pg  per  g  tissue)22.  In  stark  contrast  to  human  breast  tissue,  the 
mammary  glands  of  experimental  animals  contained  approximately 
0.01  pg  per  g  tissue.  Although  present  in  extremely  low  amounts,  cad 
mium  had  profound  effects  on  the  growth  and  development  of  the 
gland,  suggesting  that  exposure  to  the  metal  may  be  a  potential  risk 
factor  for  breast  cancer.  However,  the  only  epidemiological  study  to 


suggest  a  link  between  cadmium  exposure  and  breast  cancer  risk23  is  a 
hypothesis-generating  case-control  study  based  on  death  certificates 
coded  for  occupation  and  industry.  After  excluding  homemakers,  a 
job  exposure  matrix  was  used  to  estimate  the  probability  of  risk  for 
occupational  exposure  to  cadmium  and  found  an  odds  ratio  of 
1.07-1.13  among  white  women  and  1. 5-2.3  among  black  women. 
Further  studies  arc  required  to  substantiate  these  findings. 

The  data  presented  in  this  study  provide  strong  evidence  that  cad¬ 
mium  is  a  potent  nonsteroidal  estrogen  in  vivo .  The  ability  of  envi¬ 
ronmentally  relevant  amounts  of  cadmium  to  mimic  the  effects  of 
estradiol  suggests  that  the  metal  may  represent  a  new  class  of 
endocrine  disrupters1,2,24.  Not  all  bivalent  cations  activate  the  estro¬ 
gen  receptor,  however.  Zinc,  for  example,  is  a  bivalent  cation  that 
binds  to  cysteines  in  the  DNA  binding  domain  to  form  zinc  fingers, 
but  does  not  interact  with  the  hormone  binding  domain  or  activate 
the  receptor1. 

METHODS 

Animals.  Female  Sprague -D awl ey  rats  (ITarlan)  were  ovariectomized  by  the 
vendor  at  the  age  of  28  d  and  the  animals  were  allowed  to  recover  for  3  weeks 
before  treatment  with  cadmium,  estradiol  or  the  antiestrogen  ICI-182,780. 
Cadmium  chloride  (Sigma)  was  dissolved  in  sterile  PBS  and  administered  as  a 
single  intraperitonea  I  injection  at  a  dose  of  5  pg  per  kg  body  weight  (~27 
nmol/kg).  An  estradiol  30-d  release  pellet  (Innovative  Research  of  America) 
was  implanted  subcutaneously.  The  antiestrogen  ICI-182,780  (Tocris)  was  dis¬ 
solved  in  peanut  oil  and  given  intraperitoneally  at  a  dose  of  500  pg  per  kg  per 
d.  Animals  were  killed  either  4  or  14  d  later  and  the  effects  on  histology  and 
gene  expression  were  examined. 

Pregnant  rats  were  obtained  on  day  10  of  gestation  and  were  individually 
housed  in  standard  Plexiglas  cages.  Pregnant  animals  were  injected  intraperi- 
toneally  with  either  0.5  or  5  pg/kg  cadmium  or  vehicle  on  clays  12  arid  17  of 
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Figure  3  Effect  of  in  utero  exposure  to  cadmium 
on  the  mammary  gland  in  female  offspring,  (a-c) 
Effects  of  in  utero  exposure  to  cadmium  on 
mammary  epithelial  area  (a),  number  of  terminal 
end  buds  (TEBs;  b)  and  number  of  alveolar  buds 
(ABs;  c).  Values  represent  the  mean  of  six 
animals  per  group  +  s.e.m.  Data  were  analyzed 
by  one  way  AN  OVA:  f(2i  15)~  5.2,  P<  0.02  for 
epithelial  area;  f(2  14)  =  11.6,  P<  0.001  for 
TEBs;  and  F[2  U)  =  2.7,  P<  0.01  for  ABs.  *, 

P<  0.05  (significantly  different  from  controls). 
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gestation.  Two  days  after  birth,  the  female  offspring  were  cross-fostered.  Ten 
female  pups  were  housed  with  a  lactating  dam  and  were  weaned  on  postnatal 
day  22.  Thereafter,  animals  were  housed  in  groups  of  three  to  five.  Vaginal 
opening  was  determined  by  an  investigator  that  was  blinded  to  treatment  [AU: 
OK  AS  EDITED?].  Data  were  analyzed  by  one-way  ANOVA  followed  by  Fisher 
LSD  method.  These  studies  were  approved  by  the  Georgetown  University 
Animal  Care  and  Use  Committee. 

Mammary  whole  mounts.  After  the  animals  were  sacrificed,  the  inguinal 
mammary  glands  were  removed  and  fixed  in  ethanol-glacial  acetic  acid  (3:1, 
vol/vol).  The  glands  were  then  stained  with  carmine  alum,  dehydrated  and 
cleared  with  xylenes.  Mammary  epithelial  density  was  determined  by  outlin¬ 
ing  a  2.05  cm2  section  of  the  midregion  of  the  fourth  inguinal  gland  using 
NIH  Image  (National  Institutes  of  Health)  [AU:  OK?].  In  all  treatments, 
epithelial  density  was  inversely  proportional  to  light  intensity.  Images  were 
then  transferred  to  a  Macintosh  Power  Station  and  processed  using  Abode 
Photoshop  Software.  In  the  in  utero  exposed  animals,  the  terminal  end  buds 
and  alveolar  buds  were  counted  and  the  area  was  measured. 

Immunohistochemistry.  For  immunohistochemical  examination  of  casein 
and  whey  acidic  protein  synthesis,  5-p.m  sections  were  deparaffinized,  washed 
in  Tris-buffered  saline  (pH  7.6)  containing  0.6%  Tween  20  (TBS-T;  Bio-Rad) 
and  treated  with  3%  hydrogen  peroxide.  The  sections  were  then  washed  and 
incubated  with  10%  BSAin  TBS-T  for  1  h.  Sections  were  again  washed  in  TBS 
T  and  incubated  for  2  h  at  room  temperature  with  antibodies  to  mouse  pan- 
casein  (gift  from  G.  Smith,  National  Cancer  Institute)  or  mouse  whey  acidic 
protein  (gift,  from  L.  Henninghausen,  National  Institute  of  Diabetes  and 
Digestive  and  Kidney  Diseases).  Peroxidase  staining  was  done  using  mouse 
Vectastain  kit  ABS  (Vector  Laboratories).  [AU:  PLEASE  INCLUDE  A  SEN¬ 
TENCE  ABOUT  WHY  YOU  USED  MOUSE  ANTIBODIES  (AND  WHY  IT’S 
OK)  TO  DETECT  RAT  PROTEINS.] 

Real-time  PCR.  Total  RNA  was  extracted  from  tissue  with  RNA  STAT-60  (Tel- 
Test)  and  analyzed  using  the  Platinum  qRT-PCR  Thermoscript  One-Step 
System  (Invitrogen).  The  reaction  was  run  in  the  presence  of  300  nM  of  spe¬ 
cific  primers  and  200  nM  of  fluorescently  labeled  probes  for  PgR,  C3  and 
GAPDH,  the  constitut  ive  control.  For  the  detection  of  PgR,  we  used  the  upper 
primer  5'~CTCAATGGGCTCCCTCAG-3\  the  lower  primer  5'-TGAATCTG- 
GCCTCAGGTAGTT-3'  and  the  5-carboxyfluorescein  (EAM)-labclcd  probe 
5'-CTCAAGGACAGCCTGCCCCA-3\  For  complement  C3,  we  used  the 
upper  primer  S^CTCAGTGACCAAGTGCCAGA-3',  the  lower  primer  5'- 
TCACGATCAGGTGTTCAGC-3'  and  the  FAM -labeled  probe  5'  TTCTCCTG- 
CAAGGGACCCCG-3'  for  quantitation.  For  GAPDII,  we  used  the  upper 
primer  5 ' - G AAC AT C ATCC CT G C ATCC A- 3 ' ,  the  lower  primer  S'-CCAGT- 
GAGCTTCCC GTT CA- 3'  and  the  Cy5-labeled  probe  5' -CTTGCCCACAGC- 
CTTGGCAGC-3/  for  measurement  of  the  constitutive  control.  The  reaction 
conditions  were  45  min  at  54  °C  and  5  min  at  95  °C,  followed  by  50  cycles  of  30 
s  at  95  °C,  30  s  at  54  °C  and  30  s  at  68  °C.  Fluorescent  data  were  collected  dur¬ 
ing  the  68  °C  step  using  the  iCycler  iQ  Detection  System  (Bio-Rad).  Serial 
dilutions  of  RNA  from  the  uterus  and  mammary  gland  of  an  estrogen-treated 
rat  were  used  as  a  standard.  The  data  were  normalized  for  GAPDH  expression. 

Anodic  stripping  voltammetry  assay.  The  amount  of  cadmium  was  deter 
mined  using  an  anodic  stripping  voltammetry  assay.  In  this  assay,  the  tissue 
were  ashed  in  a  muffle  furnace  at  450  °C  for  12-24  It.  The  ash  was  then 
digested  with  trace  rnelal-grade  nitric  acid  (Fischer),  the  nitric  acid  was 
removed  by  evaporation  to  near  dryness  and  the  sample  was  dissolved  in  dis¬ 
tilled  water  containing  mercuric  nitrate.  The  sample  was  placed  in  an  electro¬ 
chemical  cell  containing  a  working  electrode  made  of  glassy  carbon,  a 
platinum  counter  electrode  and  a  pseudoreference  electrode  made  of  plat¬ 
inum.  The  metal  analytes  were  then  preconcentrated  in  the  mercury  film  that 
forms  on  the  working  electrode.  After  the  preconcentration  step,  the  working 
electrode  voltage  was  scanned  in  the  positive  direction  at  50-100  mV/s.  The 
oxidative  current  peak  corresponding  to  cadmium  was  compared  with  a  stan¬ 
dard  curve  to  determine  the  amount  of  cadmium  in  the  tissue  sample.  PARC 
270  Electrochemical  Analysis  Software  (EG&G)  was  used  for  data  analysis  and 
manipulation. 
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ABSTRACT 

To  Identify  genes  associated  with  survival  from  antiestrogens,  both 
serial  analysis  of  gene  expression  and  gene  expression  microarravs  were 
used  to  explore  the  transcriptomes  of  antiestrogen-responsivc  (MCF7/ 
LCC1)  and  -resistant  variants  (MCF7/LCC9)  of  the  MCF-7  human  breast 
cancer  cell  line.  Structure  of  the  gene  microarrav  expression  data  was 
visualized  at  the  top  level  using  a  novel  algorithm  that  derives  the  first 
three  principal  components,  fitted  to  the  antiestrogen-resistant  and 
-responsive  gene  expression  data,  from  Fisher’s  information  matrix.  The 
differential  regulation  of  several  candidate  genes  was  confirmed.  Func¬ 
tional  studies  of  the  basal  expression  and  endocrine  regulation  of  tran¬ 
scriptional  activation  of  implicated  transcription  factors  were  studied 
using  promoter-reporter  assays. 

The  putative  tumor  suppressor  interferon  regulatory  factor- 1  is  down- 
regulated  in  resistant  cells,  whereas  its  nucleolar  pliosphoprotcin  inhibitor 
nucleophosmin  is  up-regulated.  Resistant  cells  also  up-regulate  the  tran¬ 
scriptional  activation  of  cyclic  AMP  response  element  (CRE)  binding  and 
nuclear  factor  kB  (NFkB)  while  down-regulating  epidermal  growth  factor 
receptor  protein  expression.  Inhibition  of  NFkB  activity  by  ICI  182,780  is 
lost  in  resistant  cells,  but  CRE  activity  is  not  regulated  by  ICI  182,780  in 
cither  responsive  or  resistant  cells.  Parthenolide,  a  potent  and  specific 
Inhibitor  of  NFkB,  inhibits  the  anchorage-dependent  proliferation  of  an- 
tiestroge n-reslst ant  but  not  antiestrogen-responsivc  cells.  This  observation 
implies  a  grealer  reliance  on  their  increased  NFkB  signaling  for  prolifer¬ 
ation  in  cells  that  have  survived  prolonged  exposure  to  ICI  182,780. 

These  data  from  serial  analysis  of  gene  expression  and  gene  microarrav 
studies  implicate  changes  in  a  novel  signaling  pathway,  involving  inter¬ 
feron  regulatory  factor-1,  nucleophosmin,  NFkB,  and  CRE  binding  in  cell 
survival  after  antiestrogen  exposure.  Cells  can  up-rcguIatc  some  estrogen- 
responsive  genes  w'hilo  concurrently  losing  the  ability  of  antiestrogens  to 
regulate  their  expression.  Signaling  pathways  that  are  not  regulated  by 
estrogens  also  can  be  np-rcgulated.  Thus,  some  breast  cancer  cells  may 
survive  antiestrogen  treatment  by  bypassing  specific  growth  inhibitory 
signals  induced  by  antagonist-occupied  estrogen  receptors. 
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INTRODUCTION 

ERs'  are  nuclear  transcription  factors,  their  activities  being  affected 
by  the  nature  of  the  ligand  bound  and  the  pattern  of  genes/proteins 
expressed  within  cells  (cellular  context;  Ref.  1).  Antiestrogens  com¬ 
pete  with  endogenous  estrogens  for  activation  of  ER,  and  induce  both 
cell  cycle  arrest  and  apoptosis  in  responsive  cells  (2).  Neither  the 
genes  regulated  by  antiestrogens  that  signal  to  apoptosis  nor  those 
genes  that  confer  an  acquired  anti  estrogen  resistance  have  been  iden¬ 
tified.  Nonetheless,  antiestrogenic  drugs  are  effective  in  both  prem¬ 
enopausal  and  postmenopausal  breast  cancer  patients,  and  in  the 
metastatic  and  adjuvant  settings  (3).  The  most  widely  used  antiestro- 
gen  in  current  clinical  practice  is  the  triphenylethylene  T AM.  Clinical 
experience  with  this  drug  likely  now  exceeds  10  million  patient  years. 
When  patients  with  metastatic  disease  are  selected  for  treatment  based 
on  the  ER  and  PgR  content  of  their  tumors,  responses  are  seen  in  up 
to  75%  of  tumors  expressing  both  receptors  (2).  TAM  also  reduces  the 
incidence  of  ER-positive  breast  cancers  in  high  risk  women  (4). 

Other  antiestrogens  have  emerged  recently,  most  notably  the  ben- 
zothiophene  Raloxifene  and  the  steroidal  ICI  182,780  (Faslodex). 
Both  drugs  appear  to  have  significant  clinical  activity  and  may  have 
better  toxicological  profiles  when  compared  with  TAM  (2).  Faslodex 
has  significant  activity  in  TAM-resistant  patients  (5),  consistent  with 
data  obtained  previously  with  TAM-resistant  human  breast  cancer 
cells  selected  in  vitro  (6). 

Despite  the  utility  of  antiestrogens,  most  tumors  that  initially 
respond  to  these  drugs  will  recur  and  require  alternative  systemic 
therapies  (2).  Unfortunately,  the  precise  mechanisms  that  confer 
resistance  remain  unknown.  Change  to  an  antiestrogen-stimulated 
phenotype  has  been  described  in  some  animal  models  (6.  7).  This 
phenotype  may  occur  in  up  to  20%  of  breast  cancer  patients  but  a  loss 
of  responsiveness  to  antiestrogens  may  be  the  more  common  pheno¬ 
type  (2).  The  expression  of  mutant  ER  proteins  and  splice  variants  has 
been  reported  but  the  functional  role  of  these  in  endocrine  resistance 
remains  unclear  (2).  Most  tumors  acquiring  antiestrogen  resistance  do 
so  while  retaining  expression  of  ER  (8).  Thus,  whereas  lack  of  ER 
expression  is  a  major  form  of  da  novo  antiestrogen  resistance,  other 
mechanisms  must  be  active  in  most  instances  of  acquired  resistance 
(2).  The  persistent  expression  of'  ER  in  tumors  with  acquired  resist¬ 
ance  suggests  that  some  cells  expressing  this  phenotype  may  either 
require  ER  expression  and/or  reflect  the  altered  expression  of  other¬ 
wise  estrogen-regulated  genes. 

Because  ER-mediated  transcription  is  directly  affected  by  anties¬ 
trogens,  we  initially  hypothesized  that  antiestrogen  resistance  might 
include  perturbations  in  the  patterns  of  expression  and/or  regulation  of 


?  The  abbreviations  used  arc:  ER,  estrogen  receptor;  CRE,  cyclic  AMP  response 
element:  CCS-IMEM,  improved  minimal  essential  medium  supplemented  with  5%  char¬ 
coal  calf  stripped  serum;  EGF-R,  epidermal  growth  factor  receptor:  1RF-1,  interferon 
regulatory  factor- 1:  NPM,  nucleophosmin:  PgR,  progesterone  receptor;  SAGE,  serial 
analysis  of  gene  expression;  TAM,  Tamoxifen;  XBP-1 ,  X-box  binding  protein-1;  FACS, 
fluorescence-activated  cell  sorting;  NFkB.  nuclear  lador  kB;  BGR-1,  early  growth 
response  factor- 1:  TNFn,  tumor  necrosis  factor  a. 
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a  subset  of  all  of  the  ER-rcgulated  genes  (1).  To  address  this  hypoth¬ 
esis,  we  first  generated  a  novel  series  of  human  breast  cancer  variants 
from  die  MCF-7  human  breast  cancer  cell  line.  These  cells  have 
different  growth  requirements  for  estrogen  and  exhibit  differential 
sensitivities  to  TAM  and  ICI  182,780  (9-1 1).  In  this  study,  we  focus 
on  MCF7/LCC1  cells  (estrogen- independent,  TAM-responsive,  and 
ICI  182.780  responsive)  and  MCF7/LCC9  cells  (estrogen-indepen¬ 
dent,  ICI  182,780  resistant,  and  TAM  cross-resistant:  Ref.  11).  Be¬ 
cause  the  cells  exhibit  comparable  cell  cycle  profiles8  and  are  both 
MCF-7  variants,  we  can  exclude  the  altered  expression  of  genes 
related  solely  to  differences  in  both  genetic  background  and  cell  cycle 
distribution.  A  direct  comparison  of  these  respective  transcriptomes 
should  identify  genes  associated  with  survival  from  long-term  anti- 
estrogen  exposure. 

Several  techniques  are  now  available  to  explore  the  transcriptomes 
of  tumors  and  experimental  models.  However,  the  most  effective 
approach  remains  a  matter  of  debate  (12).  Studies  in  breast  cancer 
have  been  limited,  most  simply  attempting  to  identify  the  genes 
expressed  in  breast  cancers.  For  example,  a  recent  study  by  J^erou 
et  al.  (13)  explored  data  from  excisional  breast  biopsies  from  42 
individuals.  Gene  clusters,  identified  by  exploration  of  the  data  struc¬ 
ture,  include  those  associated  with  ER,  HER-2.  and  IFN-induced 
genes.  A  similar  cluster  of  1FN -regulated  genes  was  identified  in  the 
breast  cancer  cell  lines  included  in  the  NIH  drug  screening  program 
(14).  Studies  comparing  the  gene  expression  profiles  of  specific  breast 
cancer  phenotypes  include  an  examination  of  histologically  different 
samples  from  a  single  breast  cancer  lesion  (15)  and  a  preliminary 
analysis  of  a  TAM -stimulated  xenograft  model  (16).  None  of  these 
reports  directly  addressed  either  the  function  or  potential  role  of  the 
specific  genes  identified.  We  have  used  two  different  but  complemen¬ 
tary  approaches,  SAGE  and  gene  expression  microarrays.  These  ap¬ 
proaches  would  not  be  expected  to  provide  identical  data  because  not 
all  of  the  genes  identified  by  SAGE  are  on  the  microarrays,  some 
genes  identified  on  the  cDNA  arrays  may  be  confounded  by  cross¬ 
hybridization  to  homologous  RNAs,  and  the  ability  to  delect  signifi¬ 
cant  differences  between  the  SAGE  databases  is  affected  by  the 
relative  abundance  of  the  tags  and  the  size  of  the  databases.  We 
approached  both  technologies  as  means  to  sample  the  transcriptomes 
of  MCF7/LCC1  and  MCF7/LCC9  cells,  and  to  generate  data  that 
would  allow  us  to  begin  testing  our  hypothesis  implicating  estrogen- 
regulated  genes  in  antiestTOgen  resistance.  We  now  show  that  cells  can 
survive  prolonged  anliestrogen  treatment  by  altering  the  expression, 
patterns  of  regulation,  and  functional  activation  of  specific  estrogen- 
regulated  genes. 

MATERIALS  AND  METHODS 

Cell  Lines.  MCF7/LCC1  cells  were  derived  from  !he  estrogen-dependent 
MCF-7  human  breast  cancer  cell  line  after  selection  for  growth  in  ovariecto- 
mized  nude  mice  (9,  17).  MCF-7/LCC9  cells  were  obtained  by  an  in  vitro 
stepwise  selection  of  the  estrogen-independent  but  anti  estrogen-responsive 
MCF7/LCC1  cells  against  the  steroidal  antiestrogen  ICI  182.780  (Faslodex). 
MCF7/LCC9  cells  are  ICI  182,780  resistant  and  TAM  cross-resistant,  express 
HR  and  PgR,  and  exhibit  an  estrogen-independent  but  responsive  phenotype 
(11).  MCF7/LCC1  and  MCF7/LCC9  cells  were  roulinely  passaged  in  Im¬ 
proved  Minimal  Essential  Medium  without  phenol  red  (Biofluids.  Bethesda. 
MD)  supplemented  with  5%  CCS-IMEM.  Serum  was  stripped  of  endogenous 
estrogens  as  described  previously  and  is  estimated  to  contain  r£H)  fM  estrogen 
(18).  Vehicle  lor  all  of  the  hormone/antihormone  treatments  was  ethanol  (final 
concentration  <0.1%  v/v).  All  of  the  cell  cultures  were  maintained  at  37°C  in 
a  humidified  5%  CO?:95%  air  atmosphere  and  shown  to  be  free  of  contami¬ 
nation  with  Mycoplasma  species  as  determined  by  solution  hybridization  to 


8  R.  Clarke,  unpublished  observations. 


Mycoplasma- specific,  radiolabeled,  RNase  riboprobes  (Gen-Probe  Inc.,  San 
Diego,  CA). 

SAGE  Analyses.  SAGE  was  performed  as  described  previously  (19). 
Polyadenylic.  acid  mRNA  was  harvested  from  cells  using  biotin  labeled- 
oligodeoxythymidylic  acid  magnetic  beads  (Promega  PolyATract  System  1000 
kit;  Promega,  Madison,  WT)  and  treated  with  DNase  1  enzyme  to  remove  any 
contaminating  DNA.  mRNA  (5  gg)  was  converted  to  double-stranded  cDNA 
using  the  Life  Technologies,  Inc.  cDNA  Synthesis  kit  (Life  Technologies,  Inc., 
Rockville,  MD).  Biotinylated  cDNA  was  completely  cleaved  with  Nia  III  and 
the  3 ’-end  digested  fragments  extracted  with  magnetic  slreptavidin  beads.  The 
cDNA  was  evenly  divided  and  ligated,  one  half  to  linker  A  and  the  other  half 
to  linker  B  (19).  Cleavage  of  the  cDNA  by  BsmFl  produced  1 1-13  bp  oligo 
DNA  tags  with  linkers,  which  were  blunt-ended  with  T4  polymerase.  Linkers 
A  and  B  were  ligated  together  to  form  ditags,  which  were  then  amplified  by 
PCR  using  primers  to  linkers  A  and  B.  Ditags  (22-26  bp)  W'ere  gel  purified  and 
ligated  into  concatenated  polytags.  The  polytags  w'ere  purified  and  cloned  into 
the  Spill -digested  pZeorl  vector,  which  wras  transferred  to  competent 
TOPI  OF'  cells  by  electroporation.  Positive  clones  were  selected  overnight  at 
37RC  for  growth  on  lou'-salt  Luria-Bertani  bacteria!  plates  supplemented  with 
Luria-Bertani-Zeocin  (50  /.tg/ml)  and  isopropyl  /3-D-th  i  ogal actopyranosi d e  (1 
mM).  Colonies  were  screened  for  plasmids  containing  appropriate  inserts  by 
size  fractionating  PCR  products,  obtained  using  M13  forward  and  reverse 
primers,  in  agarose  gels.  PCR  products  containing  concatamers  of  >600  bp 
were  purified  and  sequenced. 

Characteristics  of  the  SAGE  databases  are  show'n  in  Table  1.  We  compared 
the  MCF7/LCC 1  and  MCF7/LCC9  databases,  using  the  SAGE  version  3.00 
software  (kindly  provided  by  Dr.  K.  W.  Kinzler,  Johns  Hopkins  University, 
Baltimore,  MD),  to  identify  putatively  differentially  expressed  genes.  Only  a 
representative  sample  of  these  can  be  presented.  The  genes  presented  in 
Table  2  were  primarily  selected  based  on:  (a)  fold  difference  -^2-fold;  (b)  that 
the  Tags  compared  should  represent  <2  genes;  and  (c)  that  a  Tag  found  in 
either  the  MCF7/LCC1  and/or  MCF7/LCC9  SAGE  libraries  must  represent 
^0.10%  of  the  database.  Evidence  that  a  gene  w'as  already  known  to  be 
expressed  in  breast  cancers  also  was  considered.  None  of  these  criteria  were 
considered  an  absolute  requirement  for  gene  selection.  Whereas  2-fold  w'as 
selected  as  the  cutoff,  biologically  critical  events  can  be  controlled  by  genes 
that  exhibit  a  told  regulation  as  small  as  50%  (20).  As  described  recently  by 
Man  ct  al.  (21),  analyses  wrere  used  to  compare  the  proportions  of  specific 
tags  in  each  database. 

RNA  Isolation,  Generation  of  Probes,  and  Hybridization  of  Gene  Mi¬ 
croarrays.  Each  probe  was  generated  from  an  independent  cell  culture,  each 
culture  being  grown  on  a  different  day  but  using  identical  cell  culture  condi¬ 
tions.  Six  MCF7/LCC1  and  five  MCF7/LCC9  cell  cultures  W'ere  used.  RNA 
was  isolated  from  proliferating,  subcontinent  monolayers  of  each  cell  line 
using  tire  TRIzoI  reagent  (Life  Technologies,  Inc.,  Grand  Island,  NY).  RNA 
quality  w'as  determined  by  standard  spectroscopic  mid  gel  electrophoresis 
analyses. 

Probes  for  the  Clontech  Atlas  gene  microarrays  (Cl  on  tech.  Palo  Alto,  CA) 
were  made  as  described  by  the  manufacturer.  Briefly,  1  jug  of  Dnase-treated 
mRNA  was  primed  with  the  Clontech  cDNA  Synthesis  Primer  mix.  The 
product  was  reverse  transcribed  into  radiolabeled  cDNA  with  [y-32P]|dATP 
(Amersham  Life  Science  Inc.,  Arlington  Heights,  JL),  and  the  reaction  incu¬ 
bated  at  50°C  for  25  min  and  terminated  by  adding  0. 1  m  EDTA  (pH  8.0). 
Radiolabeled  cDNA  w'as  purified  and  eluted  through  a  NucleoSpin  Extraction 
Column  (centrifuged  at  14,000  rpm).  The  cDNA  probe  w'as  denatured  with  1 


Table  1  Characteristics  of  the  SAGE  libraries  from  MCF7fl.CC  I  and 
MCF7/LCC9  cells 


Characteristics  of  SAGE  libraries 

lags" 

Gene 

hits 

Tags  sequenced  from  MCF7/LCC1  cells 

12,8 16* 

5.783 

1 

Tags  sequenced  from  MCF7/LCC9  cells 

11,109* 

1,170 

2 

Number  of  Tags  identified 

10,518 

208 

3 

Number  of  known  Tags* 

7,221 

38 

4 

Number  of  unknown  Tags 

3,297 

10 

5 

"Number  of  Tags  representing  a  corresponding  number  of  gene  hits,  c.g,,  5,783  Tags 
are  specific  for  single  genes,  whereas  208  Tags  could  identify  up  to  3  genes  each. 

*  Number  of  Tags  in  each  SAGE  database. 

*'  Includes  expression  sequence  tags. 
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Table  2  Differentially  expressed  genes  identified  in  the  MC.b'7 fl.CCJ  and  MC.F7/LCC9  SAGE  libraries 


Putative  gene" 

Unigene  no. 

MCF7/LCC1 

MCF7/LCC9 

Difference'' 

P 

Gene  timet  ion 

N-ras-rclated  gene 

Hs.260523 

n 

20 

10-fold 

<0.001 

G-protein 

Cathcpsin  D 

Hs. 34  3475 

7 

34 

5- fold 

<0.001 

Protease  involved  in  lumor  invasion 

XBP-) 

Hs.  149923 

7 

os 

4- fold 

<0.001 

Transcription  factor 

Prefoldin  5 

Hs.2  88856 

6 

21 

4-fold 

0.002 

Chaperone  for  nn folded  proteins 

HSP-27 

Hs. 76067 

23 

55 

2- fold 

0.001 

Stress  response  protein 

Vit  B- 12-binding  protein 

Hs.2012 

17 

37 

2-fold 

0.002 

Vitamin-binding  protein 

NPM 

Hs.9614 

10 

14 

1. 5- fold 

>0.05 

Oncogenic  nucleolar  protein 

L14 

Hs.738 

13 

2 

6- fold 

0.021 

Ribosomal  protein 

Death-associated  protcin-6 

Hs.3369 1  6 

11 

2 

6-fold 

0.049 

Apoptosis- associated  protein 

EF-7 

Hs.2186 

22 

6 

4-fold 

0.014 

Translation  elongation  factor 

Ferritin,  heavy  polypeptide- 1 

Hs. 62954 

54 

16 

3-fold 

<0.001 

Iron-binding  protein 

a  The  gene  designations  arc  considered  putative,  although  the  identity  of  most  genes  designated  in  this  fashion  have  been  shown  to  be  correct.  These  genes  include  those  Tags  where: 
(a)  the  fold  difference  is  >2-fold;  (b)  the  Tag  could  represent  <2  genes:  and  (c)  represents  0.1%  of  either  the  MCF7/LCC1  and/or  MCF7/LCC9  SAGE  library. 
b  Predicted  fold  difference  in  gene  expression  between  MCF7/LCC1  vs.  MCF7/LCC9  cells. 
c  Obtained  by  x  analyses:  P  estimated  to  3  significant  figures. 

d NPM  /not  statistically  significant)  is  shown  because  wc  know  it  to  be  both  estrogen  regulated  and  associated  with  TAM  treatment  in  patients. 


M  NaOB  and  10  mM  EDTA,  and  incubated  at  68°C  for  20  min.  c0M  DNA  and 
1  m  NaH2P04  (pH  7.0)  were  added  to  the  denatured  probe,  and  incubated  at 
68°C  for  an  additional  10  min. 

Each  Atlas  Array  (CJontech)  was  prehybridized  with  5  ml  of  ExpressHyb 
buffer  (Clontech)  and  0.5  mg  of  denatured  DNA  from  sheared  salmon  testes  at 
68  °C  for  30  min  with  continuous  agitation.  The  cDNA  probe,  prepared  as 
described  above,  was  then  added  and  allowed  to  hybridize  overnight.  The  array 
was  washed  four  times  with  2X  SSC  containing  1%  (w/v)  SDS  for  30  min  at 
68°C  and  once  with  0.1  X  SSC  containing  0.5%  (w/v)  SDS  for  30  min  at  68°C. 
One  final  wash  was  performed  with  2X  SSC  for  5  min  at  room  temperature. 
The  Atlas  Array  was  sealed  in  plastic  and  signals  delected  by  phosphorimage 
analysis  using  a  Molecular  Dynamics  Storm  phosphor  imager  (Molecular  Dy¬ 
namics,  Sunnyvale,  CA).  Each  filter  was  used  only  once. 

Measuring  NPM  and  EGF-R  Protein  Levels.  Established  methods  were 
used  for  performing  and  quantifying  Western  analyses  of  NPM  (22,  23). 
Briefly,  10  /xg  of  protein  was  loaded  onto  an  SDS-PAGE  gel  and  fraction¬ 
ated  under  reducing  conditions  [5%  (v/v)  /3-mercaptoethano!].  To  account 
for  within-gcl  differences,  samples  were  loaded  in  a  random  sequence  onto 
each  gel.  Proteins  were  blotted  onto  nitrocellulose  membrane  and  the  blots 
probed  with  an  anti -NPM  monoclonal  antibody  (kindly  provided  by  Dr. 
Pui-Kwong  Chan,  Baylor  College  of  Medicine,  Houston.  TX;  Ref.  24). 
After  transfer  to  the  membranes,  equal  protein  loading  was  confirmed  by 
staining  the  nitrocellulose  with  Ponceau  S  as  is  widely  reported  (22,  23, 
25).  Any  material  remaining  in  the  gels  were  stained  by  Coomassic  Blue. 
'This  approach  provides  an  adequate  and  appropriate  estimate  for  equiva¬ 
lence  of  protein  loading  (22,  23,  25).  Immiinorcaclivitv  was  visualized 
using  a  horseradish  peroxidase-linked  goat  antimouse  lg(i  and  the  en¬ 
hanced  chemiluminescence  detection  system  (Amcrsham  Life  Science 
Jnc.).  Chemiluminescence  was  densitometrically  measured  using  a  Quan¬ 
tity  One  Scanning  and  Analysis  System  (pdi,  Huntingdon.  NY). 

EGF-R  is  expressed  at  low  levels  in  MCF-7  cells  and  cannot  readily  be 
detected/quantified  by  Western  blot.  Consequently,  wc  measured  immunofiuo- 
rescently  labeled  EGF-R  protein  by  FACS.  For  each  cell  line,  EGF-R  immu¬ 
nofluorescence  was  performed  by  rinsing  5  X  10*'  cells  once  in  PBS  and 
pelleting  cells  by  centrifugation  at  1000  rpm  for  5  min  at  room  temperature. 
Cell  pellets  were  resuspended  in  100  ju,l  of  an  anti-EC?  F-R  mouse  monoclonal 
antibody  that  recognizes  the  extracellular  domain  of  the  receptor  (EGF-R 
antibody- 1;  NeoMarkers,  Lab  Vision  Corp.,  Fremont.  CA;  200  /xg/ml  diluted 
1 :50  in  PBS),  and  incubated  at  room  temperature  for  .1  li.  Cell  pellets  were  then 
resuspended  in  1:50  dilution  of  R-phycoerythrin-conjugated  goat  antimouse 
IgG-2a  (CALTAG  Laboratories,  Burlingame,  CA)  and  incubated  in  the  dark 
for  30  min.  After  rinsing  in  PBS,  cells  were  again  pelleted,  fixed  by  resus¬ 
pending  in  1%  parafonn aldehyde,  and  fluorescence  measured  by  FACS. 
Control  cells  were  treated  either  with  secondary'  antibody  alone  or  with  no 
antibody.  FACS  was  performed  on  a  FACStarplus  flow  cytometer  (Becton- 
Dickinson,  Mountain  View,  CA)  at  488  nm. 

RNase  Protection  Analysis  of  IFN  Regulatory  Factor- 1  niRNA  Expres¬ 
sion.  Total  RNA  was  isolated  using  the  TRIzoI  reagent  (Life  Technologies, 
Inc.)  according  to  the  manufacturer’s  instructions.  The  1RF-1  riboprobe  was 
made  by  in  vitro  transcription  of  a  360-bp  fragment  of  the  IRF-1  cDNA.  The 
36B4  loading  control  riboprobe  was  similarly  obtained  from  a  220-bp  fragment 


of  the  36B4  cDNA  (17).  Riboprobes  were  labeled  by  the  addit  ion  of  [32PjUTP 
(Amersham  Life  Sciences  Inc.)  in  the  transcription  buffer.  To  achieve  bands 
for  the  two  genes  with  similar  intensities,  the  36B4  riboprobe  was  made  with 
a  specific  activity  of  —20%  that  of  the  IRF-1  riboprobe.  The  RNase  protection 
assays  were  performed  as  described  previously  (26).  Briefly,  total  RNA  (30 
p,g),  the  IRF- 1  riboprobe.  and  the  36B4  riboprobe  were  hybridized  overnight 
at  50°C.  After  digestion  with  RNase  A,  the  protected  fragments  were  size 
fractionated  on  6%  acrylamide  Tris-borate  EDTA-urea  mini  gels  (Novex,  San 
Diego,  CA).  The  gels  were  dried  and  the  respective  signals  quantified  by 
phosphorimager  analysis  (Molecular  Dynamics). 

Estimation  of  the  Transcriptional  Activation  of  CREs  and  NFkB.  Two 
commercially  available  promoter-reporter  assays  were  used  to  measure  NF/<B 
and  CRE  transcriptional  activities.  Experiments  were  performed  as  described 
by  the  manufacturer  (Stratagene,  La  Jolla,  CA).  Briefly,  firefly  luciferase 
reporfer  constructs,  under  the  control  of  the  appropriate  enhancer  elements  and 
/ra/L?-activator  constructs,  were  provided  in  the  PathDetcel  in  vivo  signal 
transduction  pathway  m-reporting  system  (Stratagene).  Cells  were  grown  to 
90%  confluence  in  5%  CCS-IMEM  medium  and  seeded  at  8  X  104  cells  into 
each  well  of  24-wcll  tissue  culture  dishes.  After  incubation  for  12-24  h,  cells 
were  transiently  transfected  w'ith  the  appropriate  plasmids  using  the  Qiagen 
Super  feet  transfection  reagent  as  described  by  the  manufacturer  (Qiagen, 
Valencia,  CA  ).  The  ratio  of  plasmid  to  Superfect  reagent  was  250  ng:l  /xl,  with 
a  transfection  time  of  2.5  h. 

Estrogen  (5  dm)  and  1CI  182,780  treatments  (10  nM)  were  administered  for 
48  h  alter  transfection  in  CCS-IMEM.  Transfected  cells  were  harvested  and 
firefly  luciferase  activity  measured  using  the  Stratagene  assay  system.  Activity 
is  expressed  in  relative  light  units  from  a  20- /xl  sample  as  detected  by 
luminometry.  Each  measurement  is  from  duplicate  samples,  independent  ex¬ 
periments  being  repeated  on  different  days.  Normalization  of  transfection 
efficiency  was  made  to  the  Renilla  luciferase  reporter  construct,  under  the 
control  of  the  cytomegalovirus  promoter  (Promega).  The  Renilla  luciferase 
assay  was  performed  using  the  Promega  Dual- luciferase  reporter  assay  system. 

Assessment  of  Growth  Response  to  Parthenolklc.  MCF7/LCC1  and 
MCF7/LCC9  colls  were  plated  in  96- well  tissue  culture  plates  and  incubated 
for  24  h  in  0.2  ml  of  5%  CCS-IMEM.  Medium  was  removed  and  replaced  witli 
fresh  5%  CCS-IMEM  containing  either  vehicle  (0.1%  DMSO)  or  parthenolide 
(300  n m  and  600  nw ).  Cells  were  refed  every'  third  day  w'ith  the  appropriate  cell 
culture  medium.  Cell  growth  was  determined  on  day  6,  using  a  crystal  violet 
assay  where  dye  uptake  is  directly  related  to  cell  number  (27).  Cells  were 
incubated  for  30  min  w'ith  crystal  violet  stain  [0.5%  (w/v)  crystal  violet  in  25% 
(v/v)  methanol]  at  25  °C  Unincorporated  stain  was  removed  with  deionized 
water  and  the  cells  allowed  to  dry  at  room  temperature.  Incorporated  dye  was 
extracted  into  0.1  ml  of  0.1  m  sodium  citrate  in  50%  (v/v)  ethanol  for  10-15 
min  at  room  temperature.  Absorbance  was  read  at.  570  nm  using  a  Molecular 
Devices  VmaK  kinetic  microplate  reader. 

Statistical  Analyses  and  Analysis  of  Gene  Expression  Microarray  Data. 
/  tests  were  used  to  compare  control  and  experimental  groups  as  appropriate  for 
the  RNase  protection,  Western  blot,  promoter-reporter,  and  cell  proliferation 
assays.  All  of  the  tests  were  two-tailed,  with  statistical  significance  established 
at  P  0.05,  unless  stated  otherwise. 

For  the  gene  array  studies,  background  signal  was  estimated  locally  and 
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subtracted  from  the  signal  obtained  from  its  target  eDNA.  producing  the 
background- comected  data.  These  corrections  were  done  using  the  algorithms 
in  Pathways  4.0  (Research  Genetics  Inc.,  Huntsville.  A  I.').  Background- cor¬ 
rected  data  were  normalized  to  account  for  differences  in  probe-specific 
activity,  hybridization,  and  other  variables  among  replicates  (28).  Normaliza¬ 
tion  was  accomplished  using  the  mean  value  of  all  of  the  background-corrected 
signals  on  each  array. 

Different  approaches  have  been  used  to  analyze  data  from  gene  array 
studies.  Some  methods  are  simply  based  on  fold-regulation  (29),  others  arc 
more  statistically  based  (16,  30),  and/or  apply  an  informatics- based  exploration 
of  data  structure  (31,  32).  The  optimal  approach  remains  a  subject  of  consid¬ 
erable  debate  (30).  As  with  most  gene  microarray  studies,  our  data  set  is  high 
in  dimensionality  (597  dimensions)  but  the  number  of  replicates  is  limited  by 
the  resource- intensive  nature  of  the  technology.  The  relatively  few  replicates 
limits  the  applicability  of  normal  mixture  models  and  other  analyses  that  can 
operate  in  high  dimeasional  data  space  (33,  34)  and  often  generates  noisy  data 
sets. 

Previously,  we  have  reported  a  hierarchical  visualization  algorithm  that  can 
reveal  all  of  the  major  aspects  of  the  multimodal  data  points,  which  concur¬ 
rently  exist  in  a  high  dimensional  gene  expression  space  (35,  36).  Using  this 
algorithm,  our  data  can  be  projected  from  597  dimensions  to  two  or  three 
dimensions  (multidimensional  scaling).  This  is  accomplished  by  respectively 
deriving  the  first  three  principal  components  fitted  to  the  anticstrogen  respon¬ 
sive  (MCF7/LCC1)  and  resistant  (MCF7/LCC9)  gene  expression  data  (Fig.  1). 
Thus,  we  evaluate  the  data  structure  subsets  visually  and  assess  whether  these 
contain  differentially  expressed  genes  that  may  contribute  to  the  respective 
phenotypes. 

Because  we  can  visualize  data  structure,  our  next  priority  was  1o  identify  a 
simple,  supervised  approach  for  reducing  the  dimensionality  of  the  data  with¬ 
out  affecting  its  structure.  Thus,  we  applied  geometric  and  simple  descriptive 
statistical  approaches  to  the  normalized  data  before  and  after  a  logarithmic 
transformation  of  these  data.  As  noted  previously,  the  distribution  of  the 
expression  data  for  each  gene  is  unknown  (30),  and  it  is  unclear  whether  these 
violate  the  normal  distribution  required  for  parametric  analyses.  Indeed,  it 
seems  likely  that  the  distribution  assumption  required  will  be  normal  for  some 
genes  and  not  for  others.  Whereas  most  investigators  analyze  dal  a  transformed 
by  a  logarithmic  function,  those  genes  with  values  that  appear  normally 


Fig.  1.  Visual  representations  of  the  structure  of  the  multidimensional  gene  microarray 
data.  A,  tlircc- dimensional  representation  of  597  dimensions  (A,  MCF7/LC01;  O.  MCF7/ 
LCC9)  where  the  top  three  principal  components  capture  81.2%  of  the  cumulative 
variance  in  the  data.  B,  three-dimensional  representation  of  7  dimensions  (data  from  Tabic 
3)  where  the  top  three  principal  components  capture  98.9%  ol'ihe  cumulative  variance  in 
the  data.  Axes  represent  the  first  three  principal  components  derived  i'rorn  the  gene 
expression  data  (79,  80).  Plots  are  rotated  to  provide  the  optimal  visualization.  In  both 
plots,  a  plane  is  shown  demonstrating  the  linear  separability  of  the  MCF7ZLCC1  (//  =  5) 
and  MCF7/LGC9  (n  ~  4)  gene  expression  profiles. 


distributed  before  transformation  may  no  longer  have  this  distribution  once 
transformed. 

To  be  inclusive,  we  used  simple  statistics  (/  tests)  to  explore  the  data.  The 
inflated  type- 1  error  from  multiple  comparisons  should  overestimate  (false 
positive)  significant  differences.  Wc  considered  this  preferable  to  a  high 
incidence  of  false-negative  estimates,  which  would  lead  to  the  exclusion  of 
potentially  informative  genes.  The  inclusion  of  uninformative  genes  (false 
negatives)  is  less  problematic  at  this  stage  of  the  exploration.  We  used 
Student’s  /  test,  a  t  lest  for  unequal  variance  (assumes  normal  distribution)  and 
the  nonparamctric  (distribution- free)  Wilcoxon  signed  rank  test.  Logarithm 
transformed  and  nonlransformed  data  were  explored.  This  approach  is  similar 
to  using  a  F  test  as  described  recently  by  Hedenfalk  cl  al.  (37). 

1  test  results  were  evaluated  and  candidate  genes  selected  with  which  to 
reconstruct  a  lower  dimensional  data  set  that  should  retain  most  of  the  infor¬ 
mation  apparent  in  the  top  level  visualization.  However,  the  i  test  results  were 
only  one  of  several  criteria  used  to  guide  gene  selection,  and  only  a  subset  of 
those  genes  that  appear  to  be  differentially  regulated  are  presented.  These 
genes  were  selected  by  comparing  the  results  of  t  tests  on  logarithm  trans¬ 
formed  and  untransformed  data,  fold -regulation  ('•'2-fold  or  greater  was  se¬ 
lected  because  this  difference  is  likely  to  be  confirmed  in  independent  analy¬ 
ses),  the  distribution  of  the  back  ground- corrected  and  normalized  data  for  each 
gene  (some  genes  appeared  strongly  differentially  regulated  but  did  not  gen¬ 
erate  statistically  significant  differences  because  of  heterogeneity  in  the  data), 
and  the  probable  relevance  to  breast  cancer  of  each  gene. 

Where  the  gene  subsets  (reduced  dimensional  data)  provide  a  reasonable 
description  of  the  entire  expression  data,  the  replicate  profiles  of  the  resistant 
and  responsive  cells  should  exist  in  separable  data  space  (35,  36).  Furthermore, 
if  tlie  profiles  are  adequately  defined  by  a  small,  rational  gene  subset,  some  of 
its  members  likely  represent  differentially  expressed  and  functionally  relevant 
genes.  We  acknowledge  that  our  approach  is  limited,  and  is  probably  only 
applicable  to  simple  comparisons  within  related  cell  culture  models. 

RESULTS 

Genes  Implicated  by  SAGE.  The  data  in  Table  1  show  the  num¬ 
ber  of  different  genes  identified.  Most  genes  were  commonly  ex¬ 
pressed.  and  were  not  differentially  expressed  between  the  MCF7/ 
LCC1  and  MCF7/LCC9  cells.  A  selection  of  the  genes  identified  by 
SAGE,  and  predicted  to  be  differentially  expressed  in  MCF7/LCC1 
and  MCF7/LCC9  SAGE  databases,  is  shown  in  Table  2.  Presentation 
of  all  of  the  genes  expressed  and/or  differentially  expressed  is  beyond 
the  scope  of  a  single,  focused  study.9  The  criteria  applied  for  gene 
selection  are  described  in  “Materials  and  Methods.”  NPM  was  in¬ 
cluded  because  wc  already  know  it  to  be  both  estrogen  regulated  (23) 
and  indirectly  associated  with  TAM  treatment  in  patients  (38).  Con¬ 
firmation  of  the  differential  expression  of  NPM  (see  Table  2  and 
Fig.  2 B)  and  altered  CRE  binding  activity  (the  function  of  XBP-1;  see 
Table  2  and  Fig.  3/7)  indicate  that  these  represent  reasonable  criteria 
for  gene  selection.  Currently,  the  XBP-1  and  NPM  are  the  only  genes 
from  tlie  SAGE  database  comparisons  for  which  wc  have  attempted  to 
confirm  differential  expression/activation. 

Comparing  the  SAGE  databases  identifies  several  genes  that  are 
up-regulated  in  MCF7/LCC9  cells  compared  with  MCF7/LCC1  cells. 
These  genes  include  XBP-1,  NPM.  cat  heps  in  D,  JJSP-27,  and  n  -ras. 
Increased  CRE  activity  is  indicated  by  the  up-regulation  of  XBP-1, 
which  regulates  gene  transcription  through  these  response  elements 
(39).  XBP-1  is  involved  in  regulating  the  expression  of  several  tissue- 
specific  genes  including  tissue  inhibitor  of  metalloproteinases,  os- 
teopontin,  and  osteocalcin  (40).  Significantly,  both  Perou  et  ah  (13) 
and  West  el  al  (41)  recently  identified  XBP-1  as  being  associated 
with  ER  gene  expression  clusters  in  human  breast  tumor  biopsies. 
NPM  is  induced  by  estrogen  in  MCF-7  cells  and  is  up-regulated  in 
estrogen-independent  cells  (23).  NPM  also  provokes  an  autoimmune 
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Fig.  2.  Confirmation  of  the  differential  expression  of 
NPM,  EGF-R,  and  IRF-1  in  MCF7/LCC1  and  MCF7/LCC9 
cells.  A,  EGF-R  protein  immunofluorescence  as  measured  by 
FACS  (representative  figure  of  three  experiments).  Arrows 
indicate  EGF-R  signal,  other  signals  are  controls  (no  anti¬ 
body;  primary  antibody  but  no  secondary  antibody).  Axes  arc 
abscissa  -  fluorescence:  ordinate  -  cell  counts.  B,  NPM 
protein  as  measured  by  Western  blotting  (*P  •£  0.02  )  and 
represented  as  a  percentage  of  control  (MCF-7  cells  growing 
in  COS-IMEM);  bars,: tSE.  Insert  -  representative  Western 
blot.  C,  JRF-1  niRNA  as  measured  by  RNasc  protection 
(+  ~  0.005,  tliree  independent  replicate  experiments) 

and  expressed  in  phosphorimager  units:  bars.  ±SE. 
Insert  ™  representative  analysis;  36B4  is  a  ribosomal  gene 
(loading  control). 
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response  in  breast  cancer  patients,  the  magnitude  of  which  is  associ¬ 
ated  with  TAM  therapy  (38). 

The  altered  expression  of  cathepsin  D  is  consistent  with  our  data 
published  previously,  showing  increased  secretion  of  this  protein  in 
several  of  our  hormone-independent  MCF-7  variants  (42).  Cathepsin 
D  expression  in  breast  tumors  also  is  associated,  at  least  in  some 
studies,  with  a  poor  prognosis  (43).  HSP-27  expression  has  been 
implicated  in  refining  the  diagnosis  of  suspicious  fine-needle  aspirates 
of  breast  tissues  (44).  Vitamin  B12  binding  proteins  arc  expressed  in 
breast  tumors  (45),  and  vitamin  B12  deficiency  is  a  likely  risk  factor 
for  breast  cancer  (46).  Ahered  expression  of  the  h-;y/.v- related  gene  is 
consistent  with  the  elevated  ras  signaling  reported  in  some  breast 
cancer  cell  lines  and  tumors  (47). 

SAGE  also  identified  genes  expressed  at  higher  levels  in  the 
parental,  antiestrogen-responsive  cells  (MCF7/LCC1)  when  com¬ 
pared  with  MCF7/LCC9  cells.  These  include  ferritin,  death-associated 
protein-6,  and  the  eukaryotic  elongation  factor- y.  ferritin  is  expressed 
in  breast  cancers,  and  breast  tumor-derived  ferritin  may  be  a  more 
useful  tumor  marker  than  measuring  levels  of  ferritin  in  serum  (48). 

Structure  of  the  Gene  Microarray  Data.  It  has  been  suggested 
that  the  cost  required  to  perform  gene  microarray  studies  can  be 
reduced  by  combining  RNA  populations  from  several  replicates  and 
performing  a  single  hybridization  on  an  Atlas  array  (16).  However,  we 
found  heterogeneity  among  replicate  experiments,  which  often  re¬ 
mained  alter  normalization.  Logarithmic  transformation  of  these  data 
reduced  this  heterogeneity  but  not  to  the  point  where  a  single  replicate 
could  be  used  to  obtain  an  adequate  description  of  the  data.  Conse¬ 
quently,  multiple  replicates  are  required  to  provide  a  more  reliable 
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Fig.  3.  Basal  transcriptional  activity  of  NFkB  and  CRE  in  MCF7/LCC1  and  MCF7/ 
LCC9  cells.  A,  NFkB.  B,  CRE.  Data  represent  mean  and  arc  expressed  as  fold  induction 
relative  to  MCF7/LCC1 :  bars.  ±SE.  All  cells  were  grown  in  the  absence  of  estrogens 
fCCS-JMEMV 
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Table  3  Representative  list  of  differentially  expressed  penes  identified  by  gene 
microarray  analyses 


Gene*  Uni  gene  no,  MCF7/LCCI*  MCF7/LCC9 _ Gene  function 


NFkB 

Hs.75569 

1 

2 

Transcription  factor  involved  in 
cell  survival  signaling 

SOD 

Ms. 7542  S 

1 

Enzyme  involved  in 

dcloxifving  oxygen  radicals 

EGR-1 

Hs.  32  6035 

3 

1 

Transcription  factor 

EGFR 

Hs.77432 

n 

1 

Growth  factor  receptor 

IRF-1 

Hs.80645 

0 

1 

Transcription  factor  involved  in 
signaling  to  cell  cycle  arrest 
and  apoptosis 

TNFcv 

Hs.24 1 570 

2 

1 

Cytokine 

TNF-R1 

Ms.  159 

2 

1 

Cytokine  receptor  involved  in 
signaling  to  apoptosis 

"Abbreviations  arc  SOD,  superoxide  dismutase;  TNF-R1,  lumor  necrosis  factor- 
receptor  1. 

h  Data  arc  represented  as  level  of  expression  relative  to  Jhe  other  ceil  line.  Data  are 
based  on  the  mean  values  for  each  gene  (6  microarravs  ofMCF7/LCCl;  5  microarrays  of 
MCF7/LCC9).  Values  arc  expressed  to  the  nearest  integer. 

estimate  of  the  putative  gene  expression  profiles.  These  observations 
on  filter  microarrays  are  consistent  with  recent  reports  for  glass 
slide-based  and  oligonucleotide  array-based  gene  expression  micro- 
arrays  (49.  50). 

Fig.  \A  is  a  visual  representation  of  the  multidimensional  data  (597 
dimensions)  in  three  dimensions.  This  visualization  allows  for  an 
inspection  of  the  data  structure,  and  the  likely  comparability  of  the 
replicates  among  each  other  and  between  the  two  experimental  groups 
(antiestrogen-responsive  MCF7/LCC1  and  antiestrogen-resistant 
MCF7/LCC9).  For  this  top  level  visualization,  the  replicate  gene 
expression  profiles  for  MCF7/LCC1  and  MCF7/LCC9  exist  within 
linearly  separable  regions  of  the  gene  expression  data  space  after 
elimination  of  one  outlier  array  from  each  experimental  group.  The 
top  three  principal  components  capture  81.2%  of  the  cumulative 
variance  in  the  data  (597  dimensions).  Thus,  the  data  structure  is 
consistent  with  differences  in  the  gene  expression  profiles  as  predicted 
by  the  known  differential  antiestrogen  responsiveness  of  the  two 
variants. 

Genes  Implicated  by  Gene  Microarray  Studies.  The  data  in 
Table  3  show  the  fold-differences  in  expression  of4  selected  genes 
identified  in  the  Clontech  Atlas  gene  microarray  studies  selected  using 
the  criteria  described  in  “Materials  and  Methods.”  The  selection  was 
not  intended  to  describe  fully  the  data  set,  only  to  assist  in  an  initial 
exploration  of  the  data.  This  small  but  rational  subset  of  genes  could 
be  additionally  evaluated  in  focused  studies  to  confirm  the  differential 
expression  patterns  and  establish  potential  functional  relevance.  Fur¬ 
thermore,  if  members  of  this  subset  were  truly  differentially  ex¬ 
pressed.  we  could  begin  to  understand  how  cells  perceive  antiestro- 
gens  and  adapt  to  this  selective  pressure. 

To  determine  whether  these  genes  arc  broadly  representative  of  the 
differences  between  the  gene  expression  profiles  of  MCF7/LCC1  and 
MCF7/LCC9  cells,  we  generated  a  three-dimensional  projection  from 
the  seven -dimensional  gene  expression  data  space  (Fig.  17?).  This  was 
necessary  because  we  used  several  criteria  to  construct  ihe  subset, 
including  some  genes  where  fold-regulation  or  distribution  of  the  data 
were  given  more  weight  than  formal  statistical  significance.  Conse¬ 
quently,  we  could  not  assume  that  we  had  maintained  the  linear 
separability  of  the  data,  at  the  top  level,  as  seen  in  all  597  dimensions. 

We  might  not  expect  this  small  subset  of  expression  data  (<2%  of 
the  information)  to  prove  as  effective  in  representing  the  respective 
phenotypes  as  the  full  data  set  (597  genes).  Nonetheless,  as  for  the 
597-dimension  visualization,  after  elimination  of  outlier  data  the 
seven-dimensional  MCF7/LCC1  and  MCF7/LCC9  profiles  remain  in 
linearly  separable,  three-dimensional  data  space.  The  top  three  prin¬ 
cipal  components  capture  98.9%  of  the  cumulative  variance  in  the 


data  (seven-dimensions).  Hi  is  observation  suggests  that  these  data 
contain  information  that  contributes  to  the  differences  in  the  molec¬ 
ular  profiles  of  these  two  variants,  that  these  genes  may  contribute  to 
the  respective  biological  phenotypes,  and  that  additional  studies  of 
their  potential  functional  relevance  are  warranted. 

Genes  expressed  at  a  higher  level  in  the  MCF7/LCC1  cells  include 
EGF-R,  EGR-1.  IRF-1,  and  both  TNFcv  and  its  RI  receptor  (TNF-R1). 
A  well-established  inverse  relationship  exists  between  the  expression 
of  EGF-R  and  ER  in  breast  tumors  (51).  EGF-R  can  induce  expression 
of  EGR-1  (52),  and  expression  of  both  genes  is  lower  in  MCF-7/LCC9 
cells.  EGR-1  is  a  transcription  factor  with  proapoptolic  activity  (53) 
that  can  block  NFkB  function  (54)  and  repress  TGF-/3  receptor 
expression  (29).  EGR-1  expression  is  down-regulated  in  7,12-dimeth- 
ylben /(^anthracene- induced  mammary  adenocarcinomas  in  rats  (55). 
IRF-1  is  an  IFN-regulated  transcription  factor  that  functions  as  a 
tumor  suppressor  gene  (56,  57)  and  is  induced  by  TNFa  (58).  A 
TNFnr-mediated  pathway  for  signaling  to  apoptosis  occurs  in  MCF-7 
human  breast  cancer  cells  (59,  60),  and  measuring  serum  TNF  con¬ 
centrations  may  be  a  useful  prognostic  marker  in  breast  cancer  pa¬ 
tients  (61).  Furthermore,  HER-2/w?m  can  block  resistance  to  TNFo:- 
induced  apoptosis  in  breast  cancer  cells,  using  a  mechanism  that 
involves  activation  of  NFkB  (62).  We  have  previously  implicated 
overexpression  of  superoxide  dismutase  in  resistance  to  TNF  a  in 
MCF-7  cells  (63).  Superoxide  dismutase  appears  to  be  up-regulated  in 
MCF7/LCC9  cells  (Table  3)  and  in  T AM-stimulated  MCF-7  xe¬ 
nografts  (64).  NFkB  (p65/RelA)  appears  expressed  at  higher  levels  in 
MCF7/LCC9  cells.  NFkB  is  overexpressed  in  ER-negative  breast 
cancer  cells  (65  )  and  has  an  important  role  in  the  development  of  the 
normal  mammary  gland  (66). 

NFM,  EGF-R,  and  IRF-1  Are  Differentially  Expressed  in 
MCF7/LCC1  and  MCF7/LCC9  Cells.  The  data*  in  Table  2  and 
Table  3  predict  differential  expression  of  NPM,  EGF-R,  and  IRF-1 
between  MCF7/LCC1  and  MCF7/LCC9  cells.  To  confirm  these  ob¬ 
servations,  we  measured  the  levels  of  the  EGF-R  (immunofluores¬ 
cence)  and  NPM  proteins  (Western  blot)  and  IRF-1  mRNA  (RNasc 
protection).  The  data  in  Fig.  2/i  show  that  MFC7/LCC9  cells  express 
lower  amounts  of  EGF-R  than  MCF-7/LCG1  cells.  NPM  protein 
expression  is  significantly  increased  in  MCF7/LCC9  cells  compared 
with  MCF7/LCC 1  cells  (Fig.  27*;  P  <  0.02),  consistent  with  the 
predicted  data  from  the  SAGE  analyses  (Table  2)  and  our  previous 
studies  (23,  38).  The  higher  levels  of  IRF-1  mRNA,  seen  in  the 
antiestrogen-responsive  MCF7/LCC1  cells  in  Table  3,  are  confirmed 
by  RNase  protection  analysis  (Fig.  2C;  P  =  0.005).  Both  the  gene 
microarray  and  RNase  protection  analyses  show  an  '''2-fold  higher 
level  of  IRF-1  expression  in  MCF7/LCC1  cells,  when  compared  with 
the  antiestrogen-resistant  MCF7/LCC9  cells. 

Transcriptional  Regulatory  Activities  of  NFkB  and  CRE  Are 
Increased  in  MCT7/LCC9  Cells.  The  increased  expression  ofNFKB 
(gene  expression  microarray)  and  XBP-1  (SAGE)  imply  increased 
transcriptional  activation  of  promoters  containing  NFkB  and  CRE 
response  elements,  respectively.  We  confirmed  these  observations 
directly,  using  commercially  available  promoter-reporter  assays  to 
measure  transcriptional  activities.  The  data  in  Fig.  3  show  that  the 
basal  activity  of  both  promoters  is  increased  in  MCF7/LCC9  cells; 
—  10-fold  for  NFkB  and  4-fold  for  CRE  (P  <  0.02).  The  increase  in 
transcriptional  activation  of  the  NFkB  constructs  is  greater  than  that 
predicted  by  the  gene  array  data,  but  mRNA,  protein,  and  protein/ 
DNA  binding  activities  can  be  poor  predictors  of  the  functional 
activation  of  some  transcription  factors  (67).  This  prediction  is  not 
problematic  for  XBP-1,  where  the  4-fold  increase  in  mRNA  expres¬ 
sion  identified  by  SAGE  (Table  2)  compares  well  with  the  4-fold 
increase  in  basal  transcriptional  activation  (Fig.  37?). 

We  next  assessed  whether  I  Cl  182,780.  the  antieslrogen  used  to 
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Fig.  4.  Regulation  of  NFkB  and  ORE  transcription  by  IC1  1  82.780  in  MCF7/LCC1  and 
MCF7/LCC9  cells.  A,  NFkB  (*P  <  0.001.  MCF7/LCC1  versus  MCF7/LCC9).  B,  CRB 
(not  significant).  NFkB  and  CRE  data  arc  represented  as  mean  of  transcriptional  activation 
expressed  as  a  percentage  of  controls  (Vehicle-treated  cells  of  the  same  cell  line); 
bars,  ±SE.  Cells  were  grown  in  CCS-IMEM  and  treated  with  10  tiM  IC1  182.780  for  48  h 
before  measuring  reporter  gene  expression. 


generate  the  MCF7/LCC9  cells,  could  regulate  the  transcriptional 
activities  of  NFkB  and  CRE.  Whereas  ICI  182.780  inhibits  NFkB 
activity  in  the  MCF7/LCC1  cells  (TAM-  and  ICI  182.780-respon- 
sive),  this  regulation  is  lost  in  the  TAM  and  ICI  182.780  cross- 
resistant  MCF7/LCC9  cells  (Fig.  4 A).  In  contrast,  ICI  182.780  treat¬ 
ment  does  not  alter  the  transcriptional  regulatory  activities  of  the  CRE 
promoter  in  any  of  these  variants  (Fig.  AB). 

MCF7/LCC9  Cells  Are  Specifically  Responsive  to  an  Inhibitor 
of  NFkB  Activity.  The  increased  activation  oI’NFkB  and  loss  of  its 
estrogenic  regulation  in  MCF7/LCC9  cells  suggests  that  these  cells 
might  now  be  partly  dependent  on  NFkB  signaling  lor  survival/ 
growth.  Consequently,  we  compared  the  growth  response  of  MCF7/ 
LCC1  and  MCF7/LCC9  cells  to  parthenolide,  a  potent  and  specific 
inhibitor  of  NFkB  that  can  inhibit  the  inhibitor  of  NFkB  kinase 
repressor  of  NFkB  (68,  69)  and  also  binds  NFkB  in  a  highly  ste¬ 
reospecific  manner  to  block  DNA  binding  (70).  Parthenolide  produces 
a  dose-dependent  inhibition  of  MCF7/LCC9  cells,  with  an  apparent 
1C50  of  -'600  nM  (Fig.  5).  In  contrast,  parthenolide  does  not  signifi¬ 
cantly  affect  growth  of  MCF7/LCC1  cells  at  these  concentrations. 
MCF7/LCC9  cells  are  significantly  more  dependent  on  the  transcrip¬ 
tional  regulatory  activities  of  NFkB  than  their  ICI  182.780-responsive 
parental  cells  (P  <  0.01  for  MCF7/LCC9  versus  MCF7/LCC1  at  both 
300  nM  and  600  nM  parthenolide). 

DISCUSSION 

We  have  begun  to  identify  the  molecular  changes  associated  with 
cell  survival  after  prolonged  ICI  182,780  treatment  in  breast  cancer 
cells.  Whereas  we  have  not  attempted  to  confirm  the  altered  expres¬ 
sion  of  all  implicated  genes,  some  expression  patterns  are  consistent 
with  the  activities  we  have  confirmed.  Here  we  discuss  only  those 
genes  for  which  altered  mRNA.  protein,  and/or  transcriptional  acti¬ 
vation  have  been  confirmed,  and  that  are  known  to  interact  with  each 
other  in  various  cellular  models,  i.e.,  IRF-1,  NPM.  NFkB,  and  CRE. 


IRF-1  can  function  as  a  tumor  suppressor  and  can  signal  to  apo¬ 
ptosis  through  both  p53-dependent  and  p53-independent  pathways 
(71).  These  observations  may  partly  reflect  the  ability  of  IRF-1  to 
induce  a  caspase  cascade  through  activation  of  either  caspase  1  (ICE; 
Ref.  72)  and/or  caspase  7  (73).  Caspase  1  is  involved  in  regulating 
apoptosis  in  normal  mammaiy  epithelial  cells  (74).  and  overexpres¬ 
sion  of  caspase  1  is  lethal  in  MCF-7  human  breast  cancer  cells  (75). 
Preliminary  data  from  our  laboratory  demonstrate  that  overexpression 
of  IRF-1  inhibits  anchorage-dependent  colony  formation  and  that  the 
rate  of  cell  proliferation  in  MCF-7  cells  is  inversely  related  to  the  level 
of  IRF-1  expression  (76).  These  data  suggest  that  the  down-regulation 
of  IRF-1  in  MCF7/LCC9  cells  may  protect  these  cells  from  IRF-1- 
induced  inhibition  of  proliferation  and/or  induction  of  apoptosis. 

NPM  can  function  as  an  oncogene,  its  overexpression  fully  trans¬ 
forming  NIH  3T3  cells  in  a  standard  assay  for  oncogenic  potential 
(77).  We  have  shown  that  levels  of  autoantibodies  to  NPM  increase  in 
breast  cancer  patients  6  months  before  their  recurrence.  Consistent 
with  an  estrogen ic/anti estrogenic  regulation  of  NPM,  the  levels  of 
these  autoantibodies  are  lower  in  breast  cancer  patients  that  have 
received  TAM  (38).  The  increased  NPM  expression  in  MCF7/LCC9 
cells  compared  with  MCF7/LOC1  cells  may  reflect  oncogenic  poten¬ 
tial  of  NPM.  an  activity  potentially  related  to  its  ability  to  inhibit 
IRF-1  function  (sec  below). 

NFkB  has  been  implicated  in  resistance  to  cytotoxic  drugs  and  can 
function  as  a  survival  factor  in  various  cell  types  (78).  Several  aspects 
of  normal  mammary  gland  development  appear  dependent  on  NFkB 
activity  (66),  perhaps  partly  reflecting  its  estrogenic  regulation  (65). 
Elevated  NFkB  activity  arises  early  during  neoplastic  trail sfomiation 
in  the  rat  mammary  gland  (79).  Widely  expressed  in  breast  cancer 
cells  and  tumors,  elevated  NFkB  activity  is  associated  with  estrogen- 
indcpcndence  (65.  66).  Currently,  NFkB  is  the  only  protein  known 
to  induce  BRCA2  expression  (80).  ICI  182,780  cannot  suppress  the 
increased  NFkB  activity  in  MCF7/LCC9  cells,  despite  inhibiting 
this  function  in  ICI  182,780-rcsponsive  cells  (MCF7/LCC1).  The 
functional  relevance  of  this  observation  was  tested  directly  using 
parthenolide.  which  both  specifically  binds  NFkB  and  blocks 
degradation  of  the  endogenous  NFkB  inhibitor  IkB,  resulting  in 
the  inhibition  of  NFkB  transcriptional  regulatory  activities  (68, 
70).  This  activity  of  parthenolide  has  been  used  to  evaluate  the 
functional  role  of  NFkB  in  several  recent  studies  (68,  69,  81,  82). 
MCF7/LCC9  cells  are  significantly  more  sensitive  to  growth  inhibi¬ 
tion  by  parthenolide  than  their  MCF7/LCC1  parental  cells.  This 


[Parthenolide] 

Fig.  5.  Response  to  inhibition  of  NFkB  activity  by  parthenolide.  Data  represent  mean 
of  four  determinations,  where  absorbance  in  each  treated  population  is  expressed  as  a 
percentage  of  the  absorbance  in  control  cells  (vehicle  treated  cells  of  the  same  cell  line). 
*p  «■-  0.01  MCF-7/LCC1  versus  MCF7/LCC9.  Cells  were  grown  in  CCS-IMEM  without 
( control:  vehicle  only)  or  with  parthenolide  supplementation  (300  nM:  600  nM). 
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observation  is  consistent  with  a  greater  functional  reliance  on  NFkB 
activation  for  cell  growth/survival,  and  implies  that  one  option  for 
surviving  antiestrogen  exposure  is  the  up-regulnlion  of  an  estrogen- 
regulated  survival  lactor(s)  concurrent  with  the  loss  of  its  ER- 
mediated  regulation.  Furthermore,  parthenolidc  is  now  in  clinical 
trials,  and  our  data  suggest  that  it  may  prove  useful  in  combination 
with  Faslodex  or  other  antieslTogens  to  either  increase  responsiveness 
and/or  delay  the  appearance  of  resistant  disease. 

XBP-1  has  been  identified  recently  in  clusters  of  genes  associ¬ 
ated  with  ER-positive  breast  tumors  in  two  independent  studies 
(13,  41).  and  its  expression  is  increased  in  MCF7/LCC9  cells. 
XBP-1  is  a  transcription  factor  that  binds  and  activates  CRE  (39). 
The  importance  of  CRE-regulated  events  is  widely  reported  in 
many  cell  types  (83,  84).  These  events  include  a  likely  role  in 
signal  transduction  either  at  or  downstream  of  ER  and  PgR  (85). 
The  relevance  of  increased  CRE  activity  in  MCF7/LCC9  cells  is 
additionally  supported  by  recent  evidence  that  CRE-decoy  oligo¬ 
nucleotides  inhibit  the  growth  of  MCF-7  cells  (86).  We  detected  a 
4-fold  increase  in  CRE  transcriptional  activation  in  MCF7/LCC9 
cells.  Importantly,  ICI  182.780  cannot  regulate  CRE  activity 
in  either  MCF7/LCC1  (ICI  182,780-responsive)  or  MCF7/LCC9 
(resistant)  cells.  These  data  imply  an  additional  option  available 
to  breast  cancer  cells,  a  switch  to  signaling  pathways  that  are 
normally  independent  of  ER-mediated  signaling. 

IRF-1,  NPM,  NFkB,  and  CRE  are  known  to  affect  cell  prolifera¬ 
tion,  apoptosis,  and/or  carcinogenesis.  Two  critical  protein-protein 
interactions  directly  link  the  IRF-1.  NFkB.  and  NPM  proteins.  Direct 
binding  occurs  between  IRF-1  and  NPM  (77),  and  between  IRF-1  and 
NFkB  (87,  88).  In  both  cases,  the  interactions  with  IRF-1  have 
important  effects  on  gene  transcription  and  cell  signaling.  NPM  bind¬ 
ing  inhibits  the  transcription  regulatory  activities  of  IRF-1  (77).  A 
coordinated  perturbation  in  the  regulation  of  these  two  genes  lias 
occurred  in  the  MCF7/LCC9  cells:  NPM  is  up-regulated  and  IRF-1  is 
down-regulated.  Thus,  overexpression  of  NPM  could  additionally 
reduce  the  remaining  lower  levels  of  IRF-1,  potentially  blocking/ 
eliminating  its  ability  to  initiate  an  apoptotic  caspase  cascade  through 
caspase  1  and/or  caspase  7.  Such  an  effect  would  likely  also  eliminate 
the  ability  of  IRF-1  to  induce  p21cip1/wafl  (89)  and  cooperate  with 
wild-type  p53  in  signaling  to  apoptosis  (56,  57).  Changes  in  the 
amount  of  available  IRF-1  will  directly  affect  the  number  of  IRF-1: 
NFkB  heterodimers  available  to  regulate  an  additional  scries  of  genes. 
Whereas  NFkB  will  compete  with  NPM  for  IRF-1  binding,  their 
relative  affinities  for  IRF-1  are  unknown,  and  the  preferred  IRF-1 
heterodimer  remains  to  be  established.  IRF-1  :NFkB  protein-protein 
interactions  or  other  cooperative  interactions  arc  implicated  in  the 
induction  of  ATF-2/jun  (90),  RANTES  (91).  VCAM-1  (88),  inter¬ 
leukin  6  (92),  and  MF1C  class  1  genes  (87).  A  functional  1FN-/3 
enhanceosome  has  been  described  that  includes  IRF-1.  NFkB,  and 
ATF2/jun  (93).  The  importance  of  both  IRF-1  and  NFkB  in  1FN- 
induced  signaling  may  contribute  to  the  ability  of  IFNs  to  increase 
responses  to  antiestrogens  (94-96). 

CRE  activation  also  may  interact  with  the  pathways  regulated  by 
IRF-1,  NFkB,  and  NPM  interactions.  Delgado  at  al.  (97)  described  a 
cyclic  AMP-dependent  pathway  that  inhibits  IRF-1  transactivation. 
Thus,  the  increased  CRE  activity  in  MCF7/LCC9  cells  may  explain, 
in  part,  the  lower  IRF-1  mRNA  levels  seen  both  in  the  gene  expres¬ 
sion  arrays  and  in  the  IRF-1  RNase  protection  studies. 

The  concurrent  changes  in  NPM,  IRF-1,  NFkB,  and  CRE  suggest 
a  novel  integrated  signaling  pathway  that  may  involve  the  ability  of 
NPM  and  CRE  to  inhibit  IRF-1  initiation  of  a  caspase  cascade  to 
apoptosis,  the  altered  ability  of  cells  to  induce  genes  dependent  on 
IRF-1:NFkB,  and  an  increased  activation  of  survival  pathways  that 
involve  both  NFkB  and  CRE.  Studies  to  additionally  establish  the 
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nature,  function,  and  regulation  of  this  putative  pathway  are  currently 
in  progress,  including  an  overexpression  of  NFkB  in  sensitive  cells 
and  a  dominant-negative  approach  in  resistance  cells.  Because  we 
looked  only  at  cells  that  survived  long-term  antiestrogen  exposure,  the 
ability  of  the  changes  implicated  in  the  present  study  to  protect  from 
an  initial  or  short  term  exposure  have  yet  to  be  determined.  For 
example,  cells  may  or  may  not  survive  an  initial  antiestrogenic  expo¬ 
sure  by  the  same  mechanisms  that  allow  for  long-term  survival. 
Irrespective  of  whether  these  other  genes  are  functionally  involved, 
their  patterns  of  expression  may  be  important  in  better  predicting  the 
25%  of  ER+/PgR+,  55%  of  ER-/PgR+,  and  66%  of  ER+/PgR- 
breast  tumors  that  do  not  respond  to  antiestrogens  (2). 

It  is  not  possible,  in  a  single  focused  study,  to  define  all  of  the 
potentially  differentially  expressed  genes  nor  to  establish  their  func¬ 
tional  relevance  firmly.  Because  the  number  of  cellular  models  stud¬ 
ied  is  small,  additional  functional  studies  where  expression  of  the 
candidate  genes  is  induced  or  repressed  are  in  progress.  Nonetheless, 
our  data  imply  that  breast  cancer  cells  have  highly  plastic  transcrip- 
totnes,  with  access  to  several  signal  transduction  pathways  for  regu¬ 
lating  the  choice  to  differentiate,  proliferate,  or  die.  For  example, 
MCF7/LCC9  cells  have  taken  several  possible  interactive/interdepen- 
denl  approaches  to  circumvent  the  growth  inhibitory  effects  of  anties¬ 
lTogens.  This  plasticity  in  gene  expression  patterns  is  consistent  with 
the  marked  heterogeneity  apparent  in  the  clinical  disease  (2,  98). 

In  summary,  our  data  suggest  that  one  molecular  profile  associated 
with  surviving  prolonged  antiestrogen  exposure  may  include  loss  of 
ER-mediated  signaling  to  apoptosis  through  IRF-1.  This  lost  signaling 
is  achieved  both  by  down-regulation  of  IRF-1  and  a  coordinated 
up-regulation  of  its  inhibitor  NPM,  and  possibly  another  protein 
partner  NFkB.  IJp-regulation  of  CRE  activities  also  is  implicated 
in  this  molecular  profile.  Other  patterns  of  gene  expression  may 
provide  alternative  routes  to  the  resistant  phenotype  or  in  cells  that 
acquire  a  TAM-stimulated  phenotype  (2).  The  identilication  of  these 
molecular  profiles  and  signaling  pathways  may  ultimately  allow  us  to 
understand  ER-regulated  signaling,  facilitate  the  development  of  novel 
treatment  strategies,  and  allow  clinicians  to  better  identify  antiestrogen- 
responsive  and  -resistant  breast  tumors. 
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Iterative  Normalization  of  cDNA  Microarray  Data 

Yue  Wang,  Jianping  Lu,  Richard  Lee,  Zhiping  Gu,  and  Robert  Clarke 


Abstract — This  paper  describes  a  new  approach  to  normalizing 
microarray  expression  data.  The  novel  feature  is  to  unify  the  tasks 
of  estimating  normalization  coefficients  and  identifying  control 
gene  set.  Unification  is  realized  by  constructing  a  window  function 
over  the  scatter  plot  defining  the  subset  of  constantly  expressed 
genes  and  by  affecting  optimization  using  an  iterative  procedure. 
The  structure  of  window  function  gates  contributions  to  the 
control  gene  set  used  to  estimate  normalization  coefficients.  This 
window  measures  the  consistency  of  the  matched  neighborhoods 
in  the  scatter  plot  and  provides  a  means  of  rejecting  control  gene 
outliers.  The  recovery  of  normalizational  regression  and  control 
gene  selection  are  interleaved  and  are  realized  by  applying  coupled 
operations  to  the  mean  square  error  function.  In  this  way,  the  two 
processes  bootstrap  one  another.  We  evaluate  the  technique  on  real 
microarray  data  from  breast  cancer  cell  lines  and  complement  the 
experiment  with  a  data  cluster  visualization  study. 

Index  Terms — Data  normalization, dynamic  programming,  gene 
expression,  gene  microarray,  linear  regression. 

I.  Introduction 

SPOTTED  cDNA  microarrays  are  emerging  as  a  powerful 
and  cost-effective  tool  for  the  large-scale  analysis  of  gene 
expression.  Using  this  technology,  the  relative  expression  levels 
in  two  or  more  mRNA  populations  derived  from  tissue  samples 
can  be  assayed  for  thousands  of  genes  simultaneously  [1],  [2]. 
Microarrays  are  potentially  powerful  tools  for  investigating  the 
mechanism  of  drug  action.  Two  recent  studies  have  described 
the  application  of  high-density  microarrays  to  examine  the  ef¬ 
fects  of  drugs  on  gene  expression  in  yeast  as  a  model  system.  A 
similar  method  applied  to  human  breast  cancer  cells  and  tissues 
would  have  direct  utility  in  the  identification  and  validation  of 
novel  therapeutics.  It  is  widely  accepted  that  the  pattern  of  genes 
expressed  within  a  specific  cell  is  essentially  responsible  for  its 
phenotype.  The  most  widely  publicized  use  of  gene  microarrays 
has  been  in  cancer  research. 

From  a  statistical  point  of  view,  sources  of  measurement  error 
within  an  array,  and  variation  between  arrays,  must  be  quanti¬ 
fied  and  taken  onto  account  in  order  to  make  indirect  compar¬ 
isons  among  samples  that  have  not  been  directly  assayed  on  the 
same  array.  For  example,  gene  microarrays  vary  with  produc¬ 
tion  batches,  e.g.,  introducing  variations  in  the  amount  of  probe 
that  hybridizes  to  areas  of  the  support  that  do  not  contain  target 
cDNAs,  or  the  amount  of  the  cDNA  spotted  onto  the  support 
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cDNA  array  image 


Fig.  1 .  Example  of  cDNA  microarray  image. 

surface.  The  specific  activity  of  the  probe  will  vary  from  probe 
to  probe,  often  reflecting  variations  in  the  amount  of  signal  pro¬ 
duced  by  each  molecule  of  label  incorporated  into  the  probe. 

Two  major  data  preprocessing  operations  are  involved:  back¬ 
ground  correction  and  interexperiment  normalization.  In  back¬ 
ground  correction,  local  sampling  of  background  can  be  used 
to  specify  a  threshold  that  a  true  signal  must  exceed.  It  is  even 
possible  to  accurately  detect  weak  signals  and  extract  a  mean 
intensity  above  background  for  the  target  [3].  A  typical  cDNA 
array  image  is  given  in  Fig.  1 . 

In  carrying  out  comparisons  of  expression  data  using  mea¬ 
surements  from  a  single  array  or  multiple  arrays,  the  question 
of  normalizing  data  arises.  A  reasonable  assumption,  adopted 
by  most  researchers,  is  that  all  experiments  are  carried  out  under 
conditions  of  a  large  excess  of  immobilized  probe  relative  to  la¬ 
beled  target.  The  kinetics  of  hybridization  are  therefore  pseud- 
ofirst  order,  and  interprobe  competition  is  not  a  factor  [3].  Under 
these  assumptions,  the  linear  differences  arising  from  the  exact 
amount  of  applied  target,  extent  of  target  labeling,  efficiencies 
of  fluor  excitation  and  emission,  and  detector  efficiency  can  be 
compounded  into  a  single  variable.  Two  major  strategies  can  be 
used  to  carry  out  normalization.  One  is  based  on  a  considera¬ 
tion  of  all  of  the  genes  in  the  sample,  and  the  other  on  a  desig¬ 
nated  subset  expected  to  be  unchanged  over  most  circumstances, 
called  the  control  gene  set.  In  instances  of  closely  related  sam¬ 
ples,  global  normalization  (e.g.,  using  all  genes)  will  be  a  useful 
tool.  As  samples  become  more  divergent,  a  good  normalization 
may  be  achieved  using  a  subset  of  constantly  expressed  genes 
(e.g.,  using  only  control  genes)  [3]. 

The  work  most  closely  related  to  our  methodology  was  re¬ 
ported  in  [4].  The  authors  introduced  a  comparison  of  gene  ex¬ 
pression  levels  arising  from  cohybridized  samples  by  taking  ra¬ 
tios  of  average  expression  levels  for  individual  genes.  A  novel 
method  of  image  segmentation  was  presented  to  identify  cDNA 
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target  sites,  and  a  hypothesis  test  and  confidence  interval  was 
developed  to  quantify  the  significance  of  observed  differences 
in  expression  ratios.  In  particular,  the  probability  density  of  the 
ratio  and  the  maximum-likelihood  estimator  for  the  distribution 
were  derived,  and  an  iterative  procedure  for  signal  calibration 
was  developed.  In  general,  however,  an  integral  ol  ratios  is  not 
the  same  as  a  ratio  of  integrals,  and  simple  ratios  of  the  data  will 
not  necessarily  provide  unbiased  estimates  of  expression  ratios. 
Alternatively,  the  mean  value  of  all  signals  on  the  hybridized 
filter  can  be  used  for  normalization,  and  further  normalizations 
can  be  done  to  a  reference  hybridization  [5].  Nonetheless,  the 
optimal  approach  remains  controversial. 

II.  Method  and  Algorithm 

In  this  paper,  we  adopt  a  somewhat  different  approach  to 
the  problem  of  normalizing  microarray  expression  data.  Rather 
than  rejecting  those  control  genes  that  give  rise  to  a  large  nor¬ 
malization  error,  we  attempt  to  iteratively  correct  them.  In  a 
nutshell,  our  idea  is  to  bootstrap  by  alternating  between  esti¬ 
mating  normalization  coefficients  and  identifying  control  gene 
subset.  The  framework  is  furnished  by  constructing  a  window 
function  over  the  scatter  plot  defining  the  subset  of  constantly 
expressed  genes.  Specifically,  this  window  measures  the  con¬ 
sistency  of  the  matched  neighborhoods  in  the  scatter  plot  and 
provides  a  means  of  rejecting  control  gene  outliers.  We  eval¬ 
uate  the  technique  on  real  microarray  data  from  breast  cancer 
cell  lines  and  complement  the  experiment  with  a  data  cluster 
visualization  study. 

Our  goal  is  to  generate  a  transfonnation  that  best  maps  the 
expression  levels  of  floating  data  set  onto  their  counterparts  in  a 
reference  data  set.  Assume  that  data  points  {;ri,  x2 .  . . . .  xUc} 
and  {2/1,  •  •  • ,  Vnc }  are  the  expression  levels  of  the  control 

or  housekeeping  genes  from  two  microarray  experiments,  where 
nc  is  the  total  number  of  control  genes.  In  this  paper,  we  use 
{.r7;}  as  the  floating  data  set  and  {yi}  as  the  reference  data 
set.  We  further  assume  that  the  normalization  can  be  accurately 
achieved  through  a  linear  regression  mapping 

yi  =  ax;  -h  b  (1) 


setting  them  to  zero.  It  can  be  shown  that  the  estimated  linear 
regression  coefficients  a  and  b  can  be  calculated  by  [6] 


X)  (Xi  -  -  Hu) 

«  =  ^-irc -  (3) 

t=l 

b-fJry  -  (ifJx  (4) 


where  yx  and  \x,v  are  the  means  of  {#,■}  and  {?/? } ,  respectively, 
given  by 
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and  the  normalization  shall  be  performed  for  all  of  the  data 
points  in  the  floating  data  set  based  on  (1). 

The  accuracy  of  the  method  highly  depends  on  the  selection 
of  control  genes.  In  addition  to  the  predetermined  control 
genes,  including  housekeeping  genes,  we  shall  add  more 
control  genes  based  on  a  reasonable  heuristics  that  the  genes 
that  are  nondifferentially  expressed  should  be  considered  as 
control  genes  in  normalization.  Posed  in  this  way,  there  is  a 
basic  “chicken-and-egg”  problem  [7].  Before  a  good  control 
gene  subset  can  be  defined,  expression  levels  of  all  genes  need 
to  be  reasonably  normalized.  Yet,  this  normalization  is,  after 
all,  the  ultimate  goal  of  computation. 

We  propose  an  iterative  regression  normalization  algorithm  to 
solve  this  problem.  First,  solely  based  on  the  predetermined  con¬ 
trol  genes  such  as  housekeeping  genes,  we  will  conduct  an  initial 
normalization  to  all  data  sets  based  on  (1)~(5).  Since  an  accu¬ 
rate  data  analysis  requires  several  repetitive  cDNA  hybridiza¬ 
tions  in  microarray  studies  [8],  starting  from  the  whole  data  set, 
we  will  then  eliminate  those  genes  from  the  control  gene  list 
whose  expressions  have  a  large  standard  deviation  across  repli¬ 
cations,  namely,  outliers,  according  to  the  criterion  given  by 


where  a  is  the  true  ratio  of  the  data  and  b  is  the  bias  correction  of 
the  data.  Since  there  are  two  free  parameters  in  the  transforma¬ 
tion,  the  estimation  of  their  values  requires  a  minimum  of  two 
data  points  that  are  known  to  be  in  correspondence.  By  consid¬ 
ering  noise  effect,  however,  more  control  points  are  needed  to 
produce  an  accurate  estimate.  This  process  is  overconstrained 
and  can  be  solved  using  least  squares  estimation.  Clearly,  a  nat¬ 
ural  criterion  is  the  minimum  mean  squared  error  between  the 
two  control  data  subsets.  Based  on  the  expression  levels  of  the 
control  genes,  the  mean  squared  error  (MSE)  can  be  written  as 

—  Y[2/i-(o.x7+fc)]2.  (2) 

UC  V  —  1 


for  all  genes,  where  is  the  number  of  replications  for  gene 
i  in  the  experiment,  Xij  is  the  expression  level  of  gene  i  in 
the  jth  replication,  m  is  the  mean  of  replications,  and  ei  is  a 
predetermined  threshold. 

In  our  experiment,  €1  is  determined  as  follows.  For  each  of  the 
genes,  the  replications  are  normalized  by  its  mean  and  the  nor¬ 
malized  standard  deviation  is  calculated.  A  mean  standard  de¬ 
viation  is  then  obtained  by  the  sample  average  of  the  individual 
normalized  standard  deviations.  Our  experience  has  shown  that 
ei  being  two  times  of  the  mean  standard  deviation  is  appropriate 
and  effective.  It  should  be  noted  that  this  criterion  will  also  elim¬ 
inate  differentially  expressed  genes  from  the  control  gene  list. 
Thus,  a  gene  will  be  selected  as  a  control  gene  if  its  expression 
level  pair  across  reference  and  floating  experiments  satisfies 


Thus,  the  search  principle  for  estimating  the  optimal  values  of 
a  and  b  is  simply  taking  the  partial  derivatives  of  the  MSE  and 


^2  <  \J A  + <  e3  and 
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(a)  (b) 

Fig.  2.  Example  of  control  gene  selection  window,  (a)  Before  and  (b)  after  normalization. 


where  e2,  e3,and  e4  are  the  empirically  predetermined  thresholds 
defining  the  subset  of  constantly  expressed  genes.  It  can  be  seen 
that  (7)  defines  a  window  function  over  the  scatter  plot.  A  typ¬ 
ical  window  function  is  illustrated  in  Fig.  2.  In  particular,  as  we 
have  noted  according  to  our  experience,  ratios  can  be  very  un¬ 
stable  when  one  (or  both)  of  the  signals  is  small  or  large.  Thus, 
we  further  eliminate  unstably  expressed  genes  from  the  control 
gene  list  using  the  constraints  defined  by  e2  and  C3.  Clearly,  c4 
provides  the  boundaries  of  a  constantly  expressed  gene  subset. 

The  algorithm  first  generates  the  interim  scatter  plot  of  the 
data  sets  through  the  observations  and  the  current  parameter 
estimates  [(1)]  and  then  updates  parameter  estimates  using  a 
newly  defined  control  gene  subset  [(3)-(5)].  The  procedure  cy¬ 
cles  back  and  forth  between  these  two  steps  until  it  reaches  a 
stationary  point  where  no  significant  change  occurs  to  the  con¬ 
tent  of  the  control  gene  subset.  A  summary  of  the  major  steps  is 
given  as  follows. 

1 )  Based  on  predetermined  control  genes  including  house¬ 
keeping  genes,  estimate  initial  values  of  and  and 
perform  an  initial  normalization  using  (3)— (5)  and  (1), 
where  only  one  data  set  is  used  as  a  reference  set  and  all 
other  data  sets  are  considered  as  floating  sets  and  shall  be 
normalized  to  the  reference  set. 

2)  Eliminate  those  genes  from  the  control  gene  list  whose 
expressions  have  a  large  standard  deviation  across  repli¬ 
cations,  according  to  the  criterion  given  by  (6). 

3)  For  each  of  experiment  pairs,  construct  a  new  control  gene 
subset  by  selecting  additional  control  genes  that  satisfy 

(7). 

4)  Based  on  the  newly  constructed  control  gene  subset, 
estimate  interim  values  of  a(m)  and  6< m)  and  perform 
data  normalization  for  each  of  the  floating  data  sets  using 
(3)-(5)  and  (1),  where  m  is  the  iteration  index. 

5)  Repeat  Steps  3)  and  4)  until  the  convergence  ( a(oo')  —>  1 
and  b(<x,y)  — >  0)  is  reached  or  no  significant  change  occurs 
to  the  content  of  the  control  gene  subset. 

The  philosophy  for  estimating  normalization  coefficients  and 
identifying  a  control  gene  set  is  similar  in  spirit  to  the  self-or¬ 


ganization  principle  [9],  [10].  The  structure  of  window  function 
gates  contributions  to  the  control  gene  subset  used  to  estimate 
normalization  coefficients  such  that  possible  oscillation  during 
algorithm  convergence  can  be  prevented.  Specifically,  the 
window  function  defines  a  neighborhood  of  scatter  centroid 
to  gating  consistency  contribution  of  the  control  gene  subset 
to  normalization.  By  making  the  value  c4  of  the  topological 
window  function  decrease  with  time,  the  neighborhood  is 
initially  very  large  and  shrinks  slowly  to  its  final  desired  size 
(e.g.,  a  nearest  neighbor  structure).  A  popular  choice  for  the 
dependence  of  e4  on  discrete  time  m  is  the  exponential  decay 
[9].  In  addition,  the  actual  algorithm  implementation  concerns 
the  issue  of  numerical  stability.  We  have  applied  a  simple 
dynamic  programming  technique  to  estimating  normalization 
coefficients,  called  a  factoring-shifting  (FS)  procedure. 
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where  at  each  complete  cycle  of  the  procedure,  we  first  use  the 
“old”  set  of  floating  data  to  determine  the  normalization  factor 
a(*|™)  using  (8)  by  setting  6  =  0  and  simply  rescale  floating 
data  values  using  (9).  These  interim  results  a?,-*  ^7n)  are  then  used 
to  obtain  the  normalization  shift  using  (10)  by  setting 
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a  =  1  and  further  generate  “new”  values  ^A'+1|n,)  of  floating 
data  using  (11).  The  procedure  cycles  back  and  forth  until  the 
values  Ofa(oo,m>  and  reach  their  stationary  points. 

In  relation  to  previous  work,  the  concept  of  using  linear  re¬ 
gression  analysis  for  microarray  normalization  can  be  traced 
back  to  [4]  and  was  further  developed  in  [12]  for  iterative  re¬ 
gression  in  conjunction  with  control  gene  selection.  Such  ap¬ 
proaches  are  based  on  several  assumptions  regarding  the  data 
and  can  be  considered  as  special  cases  of  our  framework  [5]. 

The  primary  assumption  is  that  for  either  the  entire  collection 
of  arrayed  genes  or  some  subset  such  as  housekeeping  genes,  the 
shift  of  the  measured  expression  averaged  over  the  set  is  zero 
(e.g.,  b  =  0)  and  the  ratio  of  normalized  expression  pair  aver¬ 
aged  over  the  set  should  be  one  [e.g.,  l/nc  Si=i(2/f/aa:’;)  = 
Under  these  assumptions,  there  are  basically  three  major  ap¬ 
proaches  for  calculating  the  normalization  factor  a  [5].  The  first 
simply  uses  the  mean  value  of  all  the  background-corrected  sig¬ 
nals.  Nonnalization  can  be  separately  performed  for  each  of 
the  data  sets,  without  explicitly  calculating  a  and  selecting  a 
reference  data  set  [15).  Specifically,  if  a  raw  data  pair  is  de¬ 
noted  by  ({Xi},  {y*}),  nonnalization  leads  to  ({a*/  £;}, 

1  y;})‘  By  multiplying  the  pair  with  Vu  the  re“ 
suit  is  equivalent  to  using  (4),  that  is,  ({aa?*},  ’{#})»  where 
a  =  Yji=\  Vi/  i  xi  and  &  =  0-  A  second  approach  uses 
simplified  linear  regression  analysis,  called  linear  regression 
through  the  origin  [6].  Consequently,  a  scatter  plot  of  the  nor¬ 
malized  data  set  pair  should  have  a  slope  of  one  [5].  By  set¬ 
ting  b  =  0  in  (1)  and  (2),  the  normalization  factor  is  given 
by  a  =  XiVi/Y^h  A  third  approach  relies  on  the 

assumption  that,  for  control  genes,  the  distribution  of  expres¬ 
sion  levels  can  be  modeled  and  the  mean  of  the  ratio  adjusted 
to  one  [4].  An  iterative  procedure  was  developed  to  estimate  a 
by  1  fnc  once  again  setting  b  =  0.  It  should  be 

noticed  that  some  heuristic  approximations  have  been  made  in 
using  these  approaches,  since,  in  general 
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while  our  method  is  presented  as  a  standard  linear  regression 
analysis  without  any  approximation  step. 


III.  Experiment  and  Discussion 

In  this  section,  we  will  provide  experimental  evaluat  ion  of  our 
new  nonnalization  method.  This  investigation  has  two  related 
strands.  First,  we  will  furnish  examples  demonstrating  the  use 
of  an  iterative  nonnalization  scheme  on  real  microarray  data. 
Here,  we  will  use  two  different  data  sets.  The  first  of  these 
involves  within-class  normalization  of  data  from  LCC1  breast 
cancer  cell  lines  across  replications.  The  second  example  in¬ 
volves  normalizing  between-class  breast  cancer  cell  line  data 
from  LCC1  against  LCC9,  whose  phenotypes  are  known  to  be 
different  from  LCC1. 


In  the  second  strand  of  our  experiments,  we  will  provide  an 
algorithm  accuracy  analysis.  Here,  we  confine  ourselves  to  the 
linear  regression  variant  of  the  nonnalization  process.  The  aim 
is  to  experimentally  compare  our  iterative  algorithm  with  the 
performance  of  each  of  its  components  taken  individually,  thus 
to  demonstrate  that  the  combined  processing  of  both  control 
gene  selection  and  transformation  coefficient  estimation  yields 
significant  advantages  over  existing  methods.  In  addition,  we 
would  like  to  acknowledge  that  although  the  cell  lines  are  not 
fully  representative  of  solid  tumors  in  humans,  their  patterns  of 
gene  expression  profile  are  rich  in  information,  with  respect  to 
drug  resistance. 

We  obtained  gene  expression  profiles  from  two  breast 
cancer  cell  lines.  MCF7/LCC1  is  an  estrogen-independent  but 
antiestrogen  responsive  variant  of  the  MCF-7  human  breast 
cancer  cell  line  [14],  [15].  An  antiestrogen  resistant  variant 
(MCF7/LCC9)  was  obtained  by  stepwise  selection  of  MCF7/ 
LCC1  cells  against  the  steroidal  antiestrogen  ICI  182  780 
(trade  name:  Faslodex).  MCF7/LCC9  cells  have  many  of  the 
characteristics  seen  in  anti  estrogen-resistant,  human  breast  can¬ 
cers  and  provide  a  novel  model  in  which  to  study  antiestrogen 
resistance  [14]. 

Gene  expression  profiles  were  obtained  using  the  AtlasTM 
Human  Array  cDNA  expression  microarrays  (Clontech, 
Laboratories,  Inc.,  Palo  Alto,  CA).  These  microarrays  are 
produced  on  nylon  filters  and  contain  588  target  genes  and 
nine  housekeeping  genes.  Briefly,  total  RNA  was  obtained 
from  independent  cultures  of  MCF7/LCC1  and  MCF7/LCC9 
cells  with  the  TRIzol  reagent  (Life  Technologies,  Grand  Island, 
NY).  One  fig  of  DNase-treated  mRNA  was  primed  with  Clon- 
tech’s  cDNA  Synthesis  Primer  mix  and  the  product  reverse 
transcribed  into  radiolabeled  cDNA  with  [— 32P]  dATP  (Amer- 
sham  Life  Science  Inc.,  Arlington  Heights,  IL).  Probes  were 
purified,  denatured,  and  both  COt-1  DNA  and  1  M  NaH2P04 
(pH  7.0)  added  to  the  denatured  probe.  Each  microarray 
was  prehybridized  with  5-ml  ExpressHyb  buffer  and  0.5-mg 
denatured  DNA  from  sheared  salmon  testes.  Microarray  filters 
were  hybridized  overnight  with  the  appropriate  [— 32P]-labeled 
cDNA  probe.  The  array  was  extensively  washed  and  sealed  in 
plastic,  with  signals  detected  by  phosphorimage  analysis  using 
a  Molecular  Dynamics  Storm  phosphorimager  (Molecular  Dy¬ 
namics,  Sunnyvale,  CA).  Digitization  of  these  signals  provided 
numerical  values  representing  the  signal  for  each  gene. 

Generally,  it  has  been  assumed  that,  under  variable  condi¬ 
tions,  the  expression  of  housekeeping  genes  remains  unchanged. 
Hence,  high-throughput  differential  expression  data  can  rely  on 
these  genes  for  data  normalization.  However,  recent  data  indi¬ 
cate  deviation  from  this  concept  [11]. 

To  assess  the  effectiveness  of  housekeeping  genes  in  nor¬ 
malizing  cDNA  microarray  data,  a  normalization  based  on 
single  linear  regression  is  performed  using  only  the  set  of  nine 
housekeeping  genes  suggested  by  CLONTECH.  The  scatter 
plots  of  normalization  results  are  given  in  Fig.  3.  Although 
log-log-based  scatter  plots  are  widely  used,  we  have  decided  to 
use  original  scaled  scatter  plots  since  our  numerical  simulations 
have  shown  possible  misleading  perceptions  from  the  “dis¬ 
torted”  shape  of  actual  data  distribution.  Particularly  focusing 
on  breast  cancer,  we  have  observed  significant,  variations  in 
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Fig.  3.  (a)  Scatter  plot  of  wi thin-class  normalized  microarray  data  based  on  nine  housekeeping  genes,  (b)  Scatter  plot  of  betv\>een-class  normalized  microarray 

data  based  on  nine  housekeeping  genes.  (Circle:  before  normalization;  dot:  after  normalization.) 
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Fig.  4.  Example  of  differential  expression  of  housekeeping  genes  (red  dots,  where  the  dashed  lines  are  the  partition  edges  of  the  window  functions),  (a)  Within-class 
and  (b)  betwcen-class. 


the  expression  of  these  housekeeping  genes.  For  example, 
differential  expression  was  observed  between  LCC1  and  LCC9. 
See  Fig.  4,  where  the  data  sets  were  normalized  using  the  first 
method  discussed  above.  This  fact  was  observed  from  all  of 
our  experiments  and  shared  by  the  same  observation  reported 
in  [11].  Therefore,  selection  and  use  of  housekeeping  genes 
for  normalization  of  differential  expression  data  from  various 
biological  models  should  be  approached  with  caution  [11]. 

Since  evaluation  requires  comparison  with  existing  methods, 
we  have  implemented  all  three  major  approaches  and  applied 
these  to  the  same  data  sets.  In  this  experiment,  all  genes  are 
considered  as  control  genes  and  used  in  the  calculation.  Our 
measure  of  normalization  accuracy  is  the  MSE  defined  by  (2) 
over  the  selected  control  gene  set.  The  result  of  using  the  first 


method  is  given  in  Fig.  5,  where  a  =  Vi!  ]£l=i  xi  =  9.9 
and  b  —  0;  an  MSE  of  8549  is  reached.  In  the  second  method, 
normalization  is  based  on  a  linear  regression  through  the  origin, 

i.e.,  a  =  Yh= i  xm/Yh~ i  xi  that  most  close  to  the  cor‘ 
rect  formulation.  The  corresponding  result  is  shown  in  Fig.  6, 
where  a  =  5.0  and  b  =  0.  A  lower  MSE  of  3905  is  ob¬ 
tained,  consistent  with  our  theoretical  expectation.  In  Fig.  7, 
we  show  the  normalization  result  using  the  third  method,  i.e., 
a  =  1  /nc  (2/7  lxi )•  As  predicted,  a  biased  estimate  of  the 

expression  ratio  is  obtained,  leading  to  a  high  MSE  of  20  728 
with  a  =  18. 

These  comparisons  clearly  indicate  that  the  three  existing  ap¬ 
proaches  are  not  equivalent,  as  shown  by  both  our  experimental 
results  and  the  theoretical  justification  of  (12).  To  illustrate  the 
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Fig.  5.  Scatter  plot  of  normalized  microarray  data  using  the  existing  method  1 . 
(Circle:  before  normalization;  dot:  after  normalization.) 


Fig.  6.  Scatter  plot  of  normalized  microarray  data  using  the  existing  m  ethod  2. 
(Circle:  before  normalization;  dot:  after  normalization.) 


Fig.  7.  Scatter  plot  of  normalized  microarray  data  using  the  existing  method  3. 
(Circle:  before  normalization;  dot:  after  normalization.) 

impact  of  using  the  whole  gene  set  as  control  gene  set  and  using 
a  dynamic  programming  technique  on  the  normalization  accu¬ 
racy,  we  applied  method  2  to  the  differential  expression  between 
LCC1  and  LCC9.  The  scatter  plot  is  given  in  Fig.  8.  The  corre¬ 
sponding  MSE  in  this  case  is  6527,  compared  to  the  previous 
MSE  of  3905.  An  increase  in  MSE  suggests  that,  as  samples 
become  more  divergent,  a  good  normalization  may  be  achieved 
using  a  subset  of  constantly  expressed  genes  rather  than  a  global 
normalization  (e.g.,  using  all  genes)  [3].  We  then  used  the  FS 
procedure  to  estimate  both  a  and  b .  This  additional  step  further 
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Fig.  8.  Scatter  plot  of  normalized  betiveen-class  microarray  data  using  the 
existing  method  2.  (Circle:  before  normalization;  dot:  after  normalization.) 
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Fig.  9.  Scatter  plot  of  normalized  within-class  microarray  data  based  on 
selected  control  gene  subset  using  a  static  window  function.  (Circle:  before 
normalization;  dot:  after  normalization.) 

reduced  the  MSE  to  6438,  but  this  reduction  is  probably  not  sig¬ 
nificant. 

To  explore  the  effect  of  control  gene  selection,  we  first  per¬ 
formed  an  initial  linear  regression  using  the  whole  gene  set.  Four 
different  window  functions  were  configured  to  select  control 
gene  subsets  where  <  r  <  €3  and  <j>  is  the  sector  angle  of  the 
window  function.  Based  on  the  selected  control  genes,  we  then 
applied  a  single  linear  regression  to  normalizing  within-class 
samples.  A  numerical  comparison  on  the  normalization  accu¬ 
racy  of  using  different  control  gene  subsets  is  conducted,  as  re¬ 
ported  in  Table  I.  The  main  feature  to  note  from  these  results 
is  that,  for  different  window  functions,  a  stable  estimate  of  the 
scaling  factor  a  is  obtained,  while  the  shifting  offset  b  varies  sig¬ 
nificantly  from  case  to  case.  In  addition,  the  MSEs  of  normaliza¬ 
tions  in  all  three  cases  are  comparable  (i.e.,  5632^-5796).  The 
scatter  plot  of  the  best  normalization  result  is  shown  in  Fig.  9. 

We  further  applied  the  same  procedure  to  processing  be - 
tween-class  samples  and  observed  similar  data  characteristics. 
The  scaling  factor  in  this  case  is  about  a.  =  44>  while  b  varies 
substantially.  Not  surprisingly,  an  increase  in  MSE  is  observed 
(i.e.,  6754^7384).  Numerical  analysis  with  different  window 
functions  shows  the  capable  nature  of  the  approach,  since 
the  interim  estimate  of  linear  regression  coefficient  is  very 
stable  with  a  satisfactory  low  MSE.  Indeed,  the  robustness  of 
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TABLE  I 

NUMERICAL  COMPARISONS  OF  NORMALIZATION  RESULTS  BASED  ON  A  DESIGNATED  SUBSET  OF  CONTROL  GENES  WITH  DIFFERENT  WINDOW  CONFIGURATIONS 


Window(xl03) 

r€  (1,4),  ^  =  fg 

r€  (3,6),^=  f 

re  (16,29),^=  f 

Coefficient 

o  ~  7.6,6  =  8311 

a  =  7.7,  b  =  2885 

a  =  7.7,6=803 

a  =  7.7,  b  =  —11378 
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Fig.  10.  Scatter  plot  of  normalized  between-dass  microarray  data  based  on 
selected  control  gene  subset  using  a  static  window  function.  (Circle:  before 
normalization;  dot:  after  normalization.) 

the  gene  selection  step  has  been  successfully  discovered  in 
all  our  experiments.  A  typical  scatter  plot  of  between-dass 
normalization  results,  using  a  window-based  control  gene 
selection,  is  given  in  Fig.  1 0. 

Next,  we  provide  an  illustration  of  the  iterative  properties 
of  our  normalization  algorithm.  The  sequence  in  our  experi¬ 
ment  shows  the  iterative  recovery  of  the  full  linear  regression 
matching.  In  this  within-dass  case,  10U00  <  r  <  24000 
and  <j)  =  tt/2,  tt/4.  tt/8,  7t/16,  tt/32,  tt/48.  Each  window 
shrinking  step  is  mixed  with  one  of  the  FS  steps  using  the  cur¬ 
rent  set  of  recovered  data  points.  The  initial  parameters  are  esti¬ 
mated  based  on  the  whole  gene  set.  The  normalization  process 
converges  to  a  good  solution  after  six  iterations.  Figs.  1 1  and  1 2 
show  the  scatter  plots  of  initial  and  final  normalization  results. 
Once  the  algorithm  has  converged,  the  consistency  of  the  control 
gene  selection  is  significantly  improved.  Moreover,  there  are  no 
erroneous  matches  between  control  genes  for  the  last  two  adja¬ 
cent  iterations.  The  final  control  gene  subset  contains  37  genes. 
Finally,  the  MSE  of  3892  is  in  good  agreement  with  the  corre¬ 
sponding  results  of  the  existing  methods. 

We  next  considered  the  iterative  normalization  for  between- 
dass  samples.  As  a  step  toward  improving  the  performance  of 
microarray  data  normalization,  we  have  put  considerable  effort 
into  conducting  various  studies  and  developing  reliable  control 
gene  selection  and  linear  regression  techniques.  More  precisely, 
we  aim  to  perform  an  unsupervised  normalization  when  con¬ 
fronted  with  unreliable  housekeeping  genes.  Experience  sug¬ 
gested  that,  our  newly  proposed  method  can  achieve  this  goal. 
We  applied  our  algorithm  to  the  differential  expression  between 
LCC1  and  LCC9.  In  this  between-dass  case,  10  000  <  r  < 
24  000  and  d)  -  tt/4,  tt/8,  tt/16,  tt/48.  As  before,  the  initial 
parameters  are  estimated  based  on  the  whole  gene  set.  The  nor¬ 
malization  process  converges  on  a  good  solution  after  only  four 


Fig.  1 1 .  Scatter  plot  of  initial  normalized  within-dass  microarray  data  based 
on  selected  control  gene  subset  using  a  dynamic  window  function.  (Circle: 
before  nonnalization;  dot:  after  normalization.) 
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Fig.  12.  Scatter  plot  of  final  normalized  within-dass  microarray  data  based  on 
selected  control  gene  subset  using  a  dynamic  window  function.  (Circle:  before 
normalization;  dot:  after  normalization.) 

iterations.  Figs.  13  and  14  show  the  scatter  plots  of  initial  and 
final  normalization  results.  The  final  control  gene  subset  con¬ 
tains  43  genes,  and  a  stable  and  satisfactory  MSE  of  6523  is 
reached. 

Finally,  we  used  our  previously  developed  the  VISDA 
algorithm  to  display  the  expression  patterns  of  different  cell 
line  samples  in  the  gene  expression  space  [13].  All  data  were 
normalized  using  the  new  method.  For  a  molecular  analysis 
of  breast  cancer,  the  profile  of  microarray  expression  is  the 
molecular  signature  of  interest.  The  representation  of  each 
sample  is  described  as  a  point  in  a  d-dimensional  gene  expres¬ 
sion  space  in  which  each  axis  represents  the  expression  level 
of  one  gene.  The  presence  of  well-separated  sample  groups 
implies  that  the  representations  of  samples  within  the  same 
group  are  close  to  each  other  in  this  gene  expression  space  but 
distant  from  those  of  other  samples.  Thus,  the  representations 
of  phenotype-specific  samples  form  clusters.  Fig.  15  shows  a 
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Fig.  13.  Scatter  plot  of  initial  normalized  between-class  microarray  data 
based  on  selected  control  gene  subset  using  a  dynamic  window  function. 
(Circle:  before  normalization;  dot:  after  nonmalization.) 
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Fig.  14.  Scatter  plot  of  final  normalized  betweai-class  microarray  data  based 
on  selected  control  gene  subset  using  a  dynamic  window  function.  (Circle: 
before  normalization;  dot:  after  normalization.) 


Fig.  1 5.  Projection  of  597-gene  dimensions  onto  top  three  principal  discrimi¬ 
nant  component  spaces  based  on  Fisher’s  scatter  matrix  measure  of  the 
separability  of  patterns.  With  an  accurate  data  normalization,  visual  exploration 
reveals  phenotype-specific  sample  clusters  in  gene  expression  space. 

projected  display  of  597-gene  dimensions  into  the  top  three 
principal  discriminative  component  spaces,  based  on  Fisher’s 
scatter  matrix  [9].  With  an  accurate  data  normalization,  visual 
exploration  reveals  three  phenotype -specific  sample  clusters 
in  gene  expression  space.  Using  the  trace  of  Fisher’s  scatter 


matrix  as  a  measure  of  the  separability  of  patterns,  our  new 
normalization  method  achieved  an  improved  performance  with 
respect  to  the  existing  methods. 

One  important  consideration  with  the  present  approach  is  the 
measure  of  quality  in  data  normalization  [11].  This  is  not  a  glam¬ 
orous  area,  but  progress  in  it  is  critical  for  the  future  success  of 
data  normalization  [12].  What  is  the  correct  control  gene  set  for 
a  direct  normalization  of  between-class  data  sets?  How  effec¬ 
tive  was  a  particular  normalization  method?  Did  the  succeeding 
analysis  come  to  the  correct  conclusion?  Benchmark  criteria  as¬ 
signment  in  data  normalization  are  very  different  and  difficult 
[5].  We  believe  that  in  data  normalization,  there  is  currently  no 
objective  measure  of  quality,  and  so  it  is  difficult  to  quantify 
the  merit  of  a  particular  data  normalization  technique.  The  ef¬ 
fectiveness  of  such  a  techniques  is  often  highly  data-dependent. 
However,  we  would  expect  this  iterative  normalization  method 
to  be  an  effective  tool  in  many  gene  microarray  applications. 
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ABSTRACT 

Purpose:  Gene  expression  microarray  technologies  have 
the  potential  to  define  molecular  profiles  that  may  identify 
specific  phenotypes  (diagnosis),  establish  a  patient’s  ex¬ 
pected  clinical  outcome  (prognosis),  and  indicate  the  likeli¬ 
hood  of  a  beneficial  effect  of  a  specific  therapy  (prediction). 
We  wished  to  develop  optima)  tissue  acquisition,  processing, 
and  analysis  procedures  for  exploring  the  gene  expression 
profiles  of  breast  core  needle  biopsies  representing  cancer 
and  non  cancer  tissues. 

Experimental  Design:  Human  breast  cancer  xenografts 
were  used  to  evaluate  several  processing  methods  for  pro¬ 
spectively  collecting  adequate  amounts  of  high -qualify'  RNA 
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for  gene  expression  microarray  studies.  Samples  were  as¬ 
sessed  for  the  preservation  of  tissue  architecture  and  the 
quality  and  quantity  of  RNA  recovered.  An  optimized  pro¬ 
tocol  was  applied  to  a  small  study  of  core  needle  breast 
biopsies  from  patients,  in  which  we  compared  the  molecular 
profiles  from  cancer  with  those  from  noncancer  biopsies. 
Gene  expression  data  were  obtained  using  Research  Genet¬ 
ics,  Inc.  NamedGenes  cDNA  microarrays.  Data  were  visu¬ 
alized  using  simple  hierarchical  clustering  and  a  novel  prin¬ 
cipal  component  analysis-based  multidimensional  scaling. 
Data  dimensionality  was  reduced  by  simple  statistical  ap¬ 
proaches.  Predictive  neural  networks  were  built  using  a 
multilayer  perceptron  and  evaluated  in  an  independent  data 
set  from  snap-frozen  mastectomy  specimens. 

Results:  Processing  tissue  through  RNA Later  preserves 
tissue  architecture  when  biopsies  are  washed  for  5  min  on 
ice  with  ice-cold  PBS  before  histopathological  analysis.  Cell 
margins  are  clear,  tissue  folding  and  fragmentation  are  not 
observed,  and  integrity  of  the  cores  is  maintained,  allowing 
optimal  pathological  interpretation  and  preservation  of  im¬ 
portant  diagnostic  information.  Adequate  concentrations  of 
high-quality  RNA  arc  recovered;  51  of  55  biopsies  produced 
a  median  of  1.34  gLg  of  total  RNA  (range,  100  ng  to  12.60 
jxg).  Snap-freezing  or  the  use  of  RNA  Later  does  not  affect 
RNA  recovery  or  the  molecular  profiles  obtained  from  bi¬ 
opsies.  The  neural  network  predictors  accurately  discrimi¬ 
nate  between  predominantly  cancer  and  noncancer  breast 
biopsies. 

Conclusions:  The  approaches  generated  in  these  studies 
provide  a  simple,  safe,  and  effective  method  for  prospec¬ 
tively  acquiring  and  processing  breast  core  needle  biopsies 
for  gene  expression  studies.  Gene  expression  data  from  these 
studies  can  be  used  to  build  accurate  predictive  models  that 
separate  different  molecular  profiles.  The  data  establish  the 
use  and  effectiveness  of  these  approaches  for  future  prospec¬ 
tive  studies. 

INTRODUCTION 

The  emerging  gene  microarray  technologies  provide  pow¬ 
erful  new  methodologies  with  which  to  address  several  impor¬ 
tant  issues  in  breast  cancer  research.  For  example,  it  should  be 
possible  to  define  gene  expression  patterns  that  can  identify 
specific  phenotypes  (diagnosis),  establish  a  patient’s  expected 
clinical  outcome  (prognosis),  and  indicate  the  likelihood  of  a 
beneficial  effect  of  a  specific  therapy  (prediction;  Refs.  1  and  2). 
Gene  microarray  technologies  are  performed  on  chips,  glass 
slides,  or  filters  and  allow  the  comparison  of  gene  expression 
profiles  from  two  or  more  tissues  or  the  same  tissue  in  different 
biological  states  (3).  The  technologies  continue  to  develop,  with 
considerable  discussion  regarding  which  technology  has  the 
greatest  potential  to  address  the  molecular  profiling  of  tumors. 
Each  of  the  major  approaches  has  advantages  and  disadvan- 
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tages.  but  the  most  important  consideration  is  the  ability  of  the 
technology  to  address  the  chosen  hypothesis  (4).  Overall,  there 
is  no  compelling  evidence  of  major  differences  in  the  accuracy 
or  reproducibility  of  the  various  niicroarray  platforms  (4-6). 
Studies  that  directly  compare  the  nylon-based  cDNA  arrays  with 
either  glass  slide  cDNA  arrays  and/or  oligonucleotide  chips 
consistently  report  that  these  platforms  produce  comparable  data 
(5-8). 

Because  gene  expression  technologies  provide  an  as¬ 
sessment  of  mRNA  abundance  in  a  sample,  all  require  the 
production  of  a  probe,  labeled  with  either  a  radioactive 
•  nucleotide  or  fluorescent  molecule,  generated  from  either  the 
total  or  polyadenylate  RNA  isolated  from  the  sample.  Cur¬ 
rently,  it  is  not  possible  to  isolate  adequate  concentrations  of 
high-quality  RNA  from  what  would  otherwise  be  the  most 
abundant  source:  the  formalin-fixed,  paraffin-embedded  tu¬ 
mor  specimens  available  in  established  tumor  banks.  Only 
fresh  or  appropriately  frozen  tissues  provide  the  necessary 
quality  of  RNA  for  the  preparation  of  probes  to  hybridize  to 
existing  gene  expression  microarrays. 

Whereas  many  institutions  have  frozen  tumor  banks,  these 
may  be  of  limited  use  in  obtaining  reproducible  gene  expression 
profiles  for  some  breast  cancers.  For  example,  most  are  heavily 
biased  toward  large  breast  tumors  (T3-T4).  These  tumors  are 
poorly  representative  of  the  small  tumors  now  seen  in  many 
patients  for  initial  diagnosis  (9).  A  further  concern  with  existing 
frozen  tissue  banks  is  the  frequent  lack  of  a  standardized  ap¬ 
proach  for  tissue  acquisition  and  processing.  Tissue  handling 
between  excision  and  freezing  can  vary  considerably.  For  ex¬ 
ample.  some  tumors  are  frozen  within  seconds  of  excision,  and 
others  are  placed  on  wet  or  dry  ice  after  excision,  whereas  some 
may  stand  for  many  minutes  at  room  temperature  before  being 
placed  in  liquid  nitrogen.  The  importance  of  tissue  processing  is 
often  critical  for  assessing  various  end  points  and  can  affect  both 
RNA  stability  for  RNA  in  situ  hybridizations  and  antigen  sta¬ 
bility/accessibility  for  immunohistochemistry  (10). 

The  effect  of  tissue  acquisition  and  processing  on  gene 
microarray  data  has  not  been  widely  addressed.  Nonetheless, 
this  is  likely  to  be  important  for  at  least  two  critical  parameters. 
First  is  preservation  of  high-quality  RNA.  Most  investigators 
acknowledge  the  importance  of  using  only  pure,  high-quality 
RNA  for  gene  microarray  studies  (11).  The  second  factor  is 
maintenance  of  a  tissue’s  gene  expression  profile.  For  example, 
hypoxia-  or  stress-induced  responses  can  be  induced  in  meta- 
bolically  active  cells.  Oxygen  deprivation  begins  with  the  loss  of 
tissue  perfusion  occurring  upon  excision.  This  deprivation  can 
trigger  a  hypoxic  response,  characterized  by  the  altered  expres¬ 
sion  of  specific  genes  (12,  13).  Several  of  these  genes  are 
transcription  factors  that  further  affect  the  expression  of  their 
target  genes  (13). 

One  problem  with  these  two  factors  is  that  both  can  affect 
a  sample,  but  RNA  could  still  be  obtained,  a  probe  could  still  be 
generated,  and  a  molecular  profile  could  still  be  obtained  after 
hybridization  to  a  gene  expression  microarray.  Subtle  changes 
that  are  time,  temperature,  pH,  and/or  oxygen  dependent  could 
occur  with  sufficient  variability  that  they  are  almost  impossible 
to  detect  reproducibly.  Some  tumors  with  high  metabolic  activ¬ 
ity  may  be  more  sensitive  to  hypoxia,  producing  a  statistically 
valid  and  biologically  plausible  clustering  that  could  have  re- 
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suited  more  from  tissue  processing  rather  than  tissue  biology. 
Where  such  changes  are  subtle,  expression  profiles  might  still 
appear  grossly  similar,  complicating  an  assessment  of  tissue 
processing  artifacts. 

Given  the  bias  of  existing  banks  and  the  potential  differ¬ 
ences  in  tissue  processing,  many  important  questions  in  breast 
cancer  biology  may  require  prospective  study  designs.  Such 
study  designs  are  more  valid  for  the  exploration  or  validation  of 
new  predictive  and  prognostic  factors.  Whereas  optimized  tissue 
acquisition  and  processing  strategies  for  prospective  studies 
offer  the  opportunity  for  greater  control  of  tissue  quality  than 
retrospective  studies,  these  strategies  have  not  been  described. 
In  this  study,  we  wished  to  develop  a  standard  tissue  acquisition/ 
processing  method  for  prospective  core  needle  breast  biopsy 
sampling.  This  method  should  avoid  the  initial  use  of  liquid 
nitrogen,  preserve  tissue  architecture,  and  provide  adequate  con¬ 
centrations  of  high-quality  RNA  for  microarray  analysis.  We 
now  report  a  simple  tissue  processing  approach  using  a  com¬ 
mercially  available  reagent  (RNA Later)  that  is  applicable  to 
prospective  studies  on  core  needle  biopsies.  RNA  obtained  from 
this  approach  was  compared  with  RNA  from  snap- frozen  human 
breast  biopsies  of  neoplastic  and  nonneoplastic  tissues,  gene 
expression  microarray  data  were  obtained,  and  an  accurate  neu¬ 
ral  network  capable  of  discriminating  between  these  tissues  was 
built  and  validated  in  an  independent  data  set. 

MATERIALS  AND  METHODS 

Breast  Cancer  Xenograft  Studies.  MDA-MB-231 
cells  were  inoculated  into  athymic  nude  mice  as  described 
previously  (14,  15).  Mice  were  sacrificed,  and  tumor  tissue 
was  obtained  using  sterile  scissors  and  forceps.  Needle  biop¬ 
sies  were  taken  from  the  excised  xenografts  and  placed  into 
separate  tubes  containing  0.5  ml  of  KNALater  (Ambion, 
Austin,  TX)  at  room  temperature.  Samples  were  stored  at 
various  temperatures  for  72  h  and  subsequently  processed 
according  to  the  scheme  in  Table  1.  Each  experimental  con¬ 
dition  was  explored  in  duplicate  samples.  Tissues  were  em¬ 
bedded  in  OCT  (BDH;  Poole,  Dorset,  United  Kingdom),  and 
standard  frozen  sections  were  prepared  from  each  sample. 
Subsequently,  sections  were  stained  with  H&E  and  evaluated 
by  the  study  pathologist.  The  remainder  of  the  core  was 
stored  at  —  80°C,  and  total  RNA  was  extracted  for  evaluation. 
All  animal  studies  were  performed  under  protocols  approved 
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by  the  Georgetown  University  Animal  Care  and  Use  Com- 
mittee. 

Patient  Population.  Patients  undergoing  a  diagnostic 
core  needle  or  excisional  biopsy  at  Georgetown  University 
Hospital  were  eligible  for  the  tissue  acquisition  protocol,  in 
which  additional  cores  were  obtained  for  study  purposes.  All 
patients  signed  a  written  consent  form  approved  by  the  Geor¬ 
getown  University  Medical  Center  Institutional  Review  Board. 
Core  biopsies  provided  by  the  radiologists  were  obtained  with 
either  mammographic  or  ultrasound  guidance.  Core  biopsies 
obtained  by  the  surgeons  were  obtained  either  after  surgical 
exposure  of  the  tumor  or  during  a  routine  needle  biopsy.  A  total 
of  1-4  cores  were  obtained  from  each  patient  for  study  pur¬ 
poses,  depending  on  the  size  of  the  breast  lesion.  In  addition, 
nine  frozen  breast  tumor  specimens  were  obtained  from  the 
Department  of  Oncology,  University  of  Edinburgh  (Edinburgh, 
Scotland,  United  Kingdom)  for  use  in  testing  the  neural  net¬ 
works  for  accuracy  in  identifying  tissues  as  malignant  or  non- 
malignant.  These  samples  were  collected  after  appropriate  pa¬ 
tient  consent  and  consistent  with  the  relevant  United  Kingdom 
legislation.  3n  this  study,  the  pathologist  was  blinded  to  all 
clinical  information  on  all  samples. 

Collection  and  Handling  of  Human  Breast  Core  Biop¬ 
sies  for  Microarray  Analysis.  Generally,  1-4  core  needle 
biopsies  (14-gauge  needle)  were  obtained  from  each  consenting 
patient.  Random  cores  were  immediately  snap-frozen  in  liquid 
nitrogen;  others  were  individually  placed  in  separate  cryo-tubes 
containing  0.5  ml  of  KNALater  solution.  Snap- frozen  tissues 
were  placed  directly  in  liquid  nitrogen  from  the  core  biopsy 
needle,  immediately  upon  removal  from  the  patient.  For  the 
RNAi,<z/er  samples,  core  biopsies  were  placed  in  500  fxl  of 
KNALater  and  maintained  at  4°C  for  24  h  before  snap- freezing. 
Each  tube  was  labeled  with  the  patient’s  name,  hospital  number, 
and  study  number.  Frozen  samples  were  transferred  to  the 
Lombardi  Cancer  Center’s  Tissue  and  Hislopathology  Shared 
Resource  (Washington,  DC)  for  processing. 

Before  removing  the  samples  from  the  tube  for  frozen 
section  preparation,  each  sample  was  washed  for  5  min  on  ice 
with  500  p.1  of  icercold  sterile  PBS  (RNase  free);  otherwise, 
samples  in  RNA Later  will  not  freeze  in  the  cryostat.  Each  core 
biopsy  sample  was  then  embedded  separately  in  an  OCT  block. 
A  frozen  section  was  taken,  stained  with  H&E,  and  examined  by 
the  study  pathologist.  OCT-embedded  samples  were  maintained 
frozen  at  -70°C  until  the  analysis  of  the  main  tumor  mass  was 
complete.  * 

The  study  pathologist  evaluated  all  biopsies  to  determine 
the  presence  of  invasive  cancer  and  to  estimate  the  relative 
amounts  of  normal  epithelium,  stroma,  and  fat.  Because  samples 
were  to  be  used  for  microarray  analysis,  the  percentage  of 
invasive  cancer,  normal  epithelium,  stroma,  and  fat  was  esti¬ 
mated  relative  to  cell  nuclei  only.  Provided  this  histological 
review  offered  no  new  clinical  information  important  for  patient 
care,  biopsies  suitable  for  microarray  were  identified.  In  this 
manner,  tissue  for  expression  microarray  analysis  was  ensured 
to  be  of  no  new  diagnostic  relevance.  This  determination  is 
important  because  RNA  extraction  destroys  tissue  architecture. 
If  the  samples  had  contained  information  that  modified  the 
surgical  pathology  diagnosis,  these  biopsies  would  not  have 
been  used.  This  situation  did  not  occur  in  this  study. 


Once  released  for  study,  all  patient  identifiers  were  re¬ 
moved  from  eacli  sample.  The  link  between  patient  identifiers 
and  study  identifiers  was  held  in  a  confidential  database.  Access 
to  this  database  was  reserved  only  for  the  clinical  study  principal 
investigator  and  the  data  entry  technician.  The  frozen  clinical 
material,  mostly  frozen  in  OCT,  was  directly  provided  to  the 
research  laboratory  for  storage  and/or  processing.  Upon  receipt 
in  the  research  laboratory',  tissue  was  either  stored  at  —  80°C  or 
processed  immediately  for  RNA  extraction. 

Preparation  and  Quality  Assessment  of  RNA  from  Fro¬ 
zen  Tissues.  Frozen  tissue  was  placed  in  a  1  X  1-inch 
plastic  bag  on  dry  ice  and  pulverized,  and  lysis  buffer  from 
the  Qiagen  RNeasy  kit  was  added  (Qiagen,  Inc.,  Valencia, 
CA).  Each  sample  was  then  transferred  to  a  1.5-ml  centrifuge 
tube,  homogenized  with  a  1-ml  syringe  and  an  18-gauge 
needle,  added  to  the  Qiagen  spin  column,  and  centrifuged  to 
bind  the  RNA  to  the  matrix.  The  column  was  washed  with  the 
buffers  provided  in  the  kit,  and  the  RNA  was  finally  eluted 
with  distilled  H20.  RNA  concentrations  were  determined  by 
comparing  the  absorbance  ratios  (42<5o  nm^280nm)  obtained 
spectTophotometrically  using  a  Beckman  DU 640  Spectro¬ 
photometer  (Beckman,  Fullerton,  CA). 

Because  using  standard  gel  electrophoresis  to  assess  RNA 
quality  would  require  almost  the  entire  RNA  sample,  we  used  an 
Agilent  2100  analyzer  and  RNA  6000  LabChip  kits  (RNA 
microelectroseparation  and  analysis;  Agilent  Technologies, 
New  Castle,  DE).  A  total  of  100  ng  of  each  RNA  sample  was 
loaded/well.  The  analyzer  allows  for  visual  examination  of  both 
the  18S  and  28S  rRNA  bands  as  a  measure  of  RNA  integrity. 

Probe  Generation  for  Gene  Microarray  Hybrid izalions. 
Probes  were  generated  as  described  previously  (16).  This 
method  radiolabels  both  the  sense  and  antisense  probe  strands 
and  further  increases  probe-specific  activity  by  incorporating 
two  radiolabeled  nucleotides.  Thus,  tumors  can  be  arrayed  on 
nylon  filter  arrays  with  as  little  as  100  ng  of  total  RNA  and 
without  RNA  amplification  (7,  16).  Whereas  an  adequate  signal 
is  generated  with  100  ng  of  total  RNA,  the  use  of  very  low  RNA 
concentrations  will  likely  affect  the  ability  to  adequately  and 
reproducibly  detect  many  lower  abundance  mRNAs.  We  used 
500  ng  of  total  RNA,  which  is  sufficient  to  allow  the  use  of 
approximately  70%  of  breast  needle  biopsies  without  either 
RNA  amplification  or  pooling.  None  of  the  RNAs  was  amplified 
or  pooled  in  the  current  study. 

To  synthesize  the  labeled  cDNA  probe.  500  ng  of  total 
RNA  were  incubated  at  70°C  for  10  min  with  2  mg  of 
oligodeoxythymidylate  and  then  chilled  on  ice  for  2  min.  The 
primed  DNA  was  incubated  at  37°C  for  90  min  in  a  solution 
containing  IX  first  strand,  3  niM  DTT,  1  mM  dGTP/dTTP, 
300  units  of  reverse  transcriptase,  50  mCi  of  [33P]dCTP,  and 
50  mCi  of  [33P]dATP.  The  second  strand  was  synthesized  by 
adding  IX  reaction  buffer,  100  units  of  DNA  polymerase  I, 
500  ng  of  random  primers,  1  mM  dGTP/dTTP,  50  mCi  of 
[33P]dCTP,  and  50  mCi  of  [33P]dATP.  The  reaction  was 
incubated  lor  2  h  at  16°C.  A  radiolabeled  probe  was  purified 
using  a  BioSpin- 6  chromatography  column  (Bio- Rad)  and 
denatured  by  boiling  for  3  min.  A  purified  probe  was  added 
to  the  hybridization  roller  tube  containing  the  prehybridized 
GeneFilter  and  incubated  for  12-18  h  at  42°C  in  a  Robin 
Scientific  Roller  Oven.  For  these  studies,  the  NamedGenes 
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Fig.  1  Hie  quality  of  RNA  recovered  from  hu¬ 
man  core  breast  needle  biopsies.  RNA  was  evalu¬ 
ated  using  an  Agilent  2100  analyzer.  A,  xe¬ 
nografts;  B ,  breast  core  needle  biopsies  arrayed  in 
Figs.  5  and  6  and  iurther  characterized  in  Table  4. 
In  the  images  displayed,  fluorescence  scales  are 
not  equivalent  between  lanes  but  have  been  nor¬ 
malized  for  clarity. 


Fig.  2  MDA-MB-231  human 
breast  tumor  xenografts  pro¬ 
cessed  for  frozen  tissue  section¬ 
ing  in  RNA Later.  A ,  no  wash; 
B,  washed  in  PBS:RNAZdf/«r 
(1:6;  v/v)  for  2  h  at  4°C;  C, 
washed  in  PBS:RNAZ2tfcr  (1 :6; 
v/v)  for  5  min  at  4°C;  D . 
washed  in  PBS  for  5  min  on  ice. 


filters  (Research  Genetics,  Inc.,  Huntsville,  AL)  were  used. 
These  filters  contain  4032  known  genes,  192  housekeeping 
genes,  and  192  control  genes  on  each  filter.  Each  hybridized 
GeneFilter  was  washed  twice  in  2X  SSC,  1%  SDS  at  50°C 
for  20  min  and  once  at  55°C  in  0.5X  SSC,  1%  SDS  for  15 
min.  Hybridization  signals  were  detected  by  phosphorimag- 
ing  using  a  Molecular  Dynamics  Storm  Phosphorlmager 
(Molecular  Dynamics,  Sunnyvale,  CA).  The  sensitivity  and 
reproducibility  of  these  and  other  nylon  filter-based  cDNA 
microarrays  have  been  widely  reported  (7,  17-20). 


Normalization  of  Data.  Pathways  software  algorithms 
(Research  Genetics,  Inc.)  were  used  to  correct  for  nonspecific 
binding  of  the  probe  to  filter  (background  correction).  Ap¬ 
proaches  for  signal  normalization,  intended  to  correct  for  dif¬ 
ferences  in  probe  specific  activities,  hybridizations,  and  other 
interexperiment  variables,  are  diverse  (1 1).  In  the  present  study, 
the  average  of  all  data  points  was  used  to  calculate  a  normal¬ 
ization  factor;  the  normalized  intensity  value  for  each  spot  was 
obtained  by  multiplying  the  normalization  factor  by  the  raw 
intensity  (11). 
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Fig.  3  Human  breast  needle 
biopsies  processed  for  frozen 
tissue  sectioning  in  RNA Later. 
A,  no  wash:  B,  washed  in  PBS: 
KNAIjJter  (1:6;  v/v)  for  2  h  at 
4°C;  C\  washed  in  PBS:RNA- 
I&ter  (1:6;  v/v)  for  5  min  at 
4°C;  /),  washed  in  PBS  for  5 
min  on  ice. 
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Fig.  4  Optimized  tissue  acquisition/processing 
procedure  for  breast  needle  biopsies. 
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Analysis  of  Gene  Microarray  Data.  The  optimal  ap¬ 
proach  for  analyzing  the  high  dimensional  gene  expression  data 
generated  by  gene  microarray  studies  remains  unclear.  The  high 
dimensionality  of  these  data  are  problematic,  with  most  existing 
analyses  functioning  more  accurately  in  low  dimensionality 
(21).  However,  rather  than  making  statistical  inference  for  iden¬ 
tifying  and  studying  functionally  relevant  genes,  the  study  goal 
was  to  validate  the  tissue  acquisition  and  processing  methods 
and  demonstrate  the  applicability  of  this  approach  for  building 
clinically  relevant  predictive  models. 

Recently,  we  devised  a  simple  approach  to  the  exploration 


of  small  studies  with  two  experimental  groups.5  Our  approach 
used  simple  statistical  analyses  to  reduce  data  dimensionality 
and  identify  subsets  of  discriminant  genes.  This  approach  is 
similar  in  principle  to  that  used  by  Hedenfalk  el  ah  (22). 
Because  the  class  of  each  sample  (cancer  versus  noncancer)  is 


5  Z.  Gu.  Association  of  interferon  regulatory  factor- 1,  nucleophosmin, 
nuclear  factor- kappa- B  and  cAMP  binding  with  acquired  resistance  to 
Faslodex  (IC1  182,780).  submitted  for  publication. 
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Table  2  Concentration  of  RNA  recovered  from  breast  needle  biopsies 

Biopsies _ RNA  >  100  ng" _ Total  RNA  recovered  (X  ±  SE) _ Range  (>100  ng) _ 

ti  *=  S5b  51/55  (93%)  3.63  ±  0.48  pg;  median  =  1.34  pg  100  ng  to  12.60  pg 

n  —  25  Snap-frozen  2.04  ±  0.51  pg;  median  —  1.32  pg  100  ng  to  9.00  pg‘; 

n  =  21  KNALaler  3.49  ±  0.78  pg;  median  =  2.70  pg  100  ng  to  12.60  pg 

"Number  producing  S:100  ng  of  RNA,  the  minimum  useful  concentration  of  RNA  without  amplification,  irrespective  of  the  tissue  acquisition 

and  processing  method  applied. 

h  We  did  not  have  complete  data  on  processing  for  the  first  9  of  the  55  samples. 
c  P  =  0.13;  Mann- Whitney  rahk-sum  test;  RNA Later  versus  snap-frozen  tissue. 


Table  3  Characteristics  of  breast  needle  biopsy  material 


A.  Biopsy 

Source 

ER/PR" 

%  Cancer 

%  Normal 

%  Fat 

%  CT 

RNA  (ng)* 

1 5  A<; 

Radiology 

ND 

0% 

70% 

0% 

30% 

3.28 

S6A" 

Surgery 

ND 

0% 

0% 

100% 

0% 

2.49 

10A 

Radiology 

ND 

0% 

5% 

45% 

50% 

2.07 

11A 

Radiology 

ND 

0% 

20% 

40% 

40% 

1.36 

17r 

Radiology 

ND 

2% 

0% 

90% 

8% 

6.70 

S2A 

Surgery 

4/4 

90% 

0% 

5% 

5% 

3.20 

S3AC 

Surgery 

4/4 

90% 

0% 

0% 

10% 

2.70 

S10DC 

Surgery 

4/4 

80% 

0% 

0% 

20% 

6.50 

S14BC 

Surgery 

-/“ 

80% 

0% 

0% 

20% 

4.20 

S18A 

Surgery 

+/.... 

90% 

0% 

0% 

10% 

1.70 

B.  RNA  recovered  from  biopsies  used  in  this  study  (5?  ± 

SE)rf 

No  KNALater 

2.21  ±  0.52  pg  total  KNAd 

RNA Later 

3.83  i-  0.73  pg  total  RNA 

Overall  RNA  recovered 

3.10  ±  1.60 

pg  total  RNA 

C.  Case 

Biopsies 

Pathological  diagnosis 

S2 

S2A 

Invasive  adenocarcinoma 

S2B 

Invasive  adenocarcinoma 

S2C 

Invasive  adenocarcinoma 

*  S6 

S6A 

No  cancer 

S6B 

No  cancer 

S6C 

No  tissue 

S10 

S10A 

No  cancer 

S10B 

Possible  DCIS 

S10C 

No  cancer 

S10D 

Invasive  adenocarcinoma 

"  PR,  progesterone  receptor;  CT,  connective  tissue;  DOS,  ductal  carcinoma  i?i  situ;  ND,  not  determined. 

h  Total  RNA  recovered  from  each  needle  biopsy.  Five  hundred  ng  of  each  RNA  population  were  used  to  generate  the  probes  hybridized  to  obtain 
the  data  presented  in  Fig.  5. 

r  Biopsies  processed  in  KNALater. 

d  P  -  0.129;  Student's  t  test;  KNALater  versus  no  RNA Later, 


known  from  the  histopathological  analyses,  dimensionality  can 
be  reduced  in  a  supervised  manner  by  performing  a  series  of 
statistical  tests.  The  major  purpose  of  performing  these  tests  was 
only  to  select  a  group  of  genes  that  would  be  used  for  data 
visualization  and  analysis.  Student’s  t  test  and  a  t  test  for 
unequal  variances  (each  assumes  normal  distribution  of  the 
data)  and  a  nonparametric  (distribution-free)  Wilcoxon  test  were 
used.  Whereas  the  inflated  type  1  error  will  overestimate  sig¬ 
nificant  differences,  the  incidence  of  false  negative  estimates 
should  be  smaller.  Because  the  distribution  of  the  data  among 
and  within  replicate  experiments  and  for  individual  genes  can¬ 
not  be  determined  (23),  both  logarithm-transformed  and  non- 
transformed  data  were  compared. 

Two  reduced  dimensional  data  sets  were  selected;  one 
comprising  genes  with  Ps  <  0.05,  and  one  comprising  genes 


with  Ps  <  0.02.  Because  of  their  marked  biological  differences, 
these  phenotypes  should  be  easily  separable.  Thus,  the  data  were 
visualized  using  our  Fisher  separability-based  multidimensional 
scaling  approach  that  projects  high  dimensional  data  into  three- 
dimensional  data  space  (24,  25).  Because  it  has  become  widely 
yised,  visualization  using  the  simple  hierarchical  clustering  de¬ 
scribed  by  Eisen  et  ah  (26)  is  also  presented. 

Generation  and  Testing  of  a  Neural  Network.  To  de¬ 
termine  whether  the  genes  we  selected  could  be  used  to  separate 
cancer  from  noncancer  tissues,  a  neural  network  was  trained 
using  the  gene  expression  microarray  data  from  five  cancer 
biopsies  and  five  noncancer  biopsies.  Neural  networks  can  be 
considered  as  parallel  computing  systems  consisting  of  many 
simple  processors  with  many  interconnections.  The  main  advan¬ 
tages  of  neural  networks  are  that  they  can  learn  complex  non- 
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linear  input-output  relationships,  use  sequential  training  proce¬ 
dures.  and  adapt  themselves  to  the  data  (27,  28). 

The  learning  process  involves  updating  network  architec¬ 
ture  and  connection  weights  so  that  the  predictive  model  can 
efficiently  perform  a  specific  classification  task.  We  used  a 
multilayer  preceptron  to  design  a  nonlinear  neural  classifier, 
using  each  of  the  gene’s  expression  levels  in  the  tissue  samples 
as  the  input  and  the  cancer  versus  noncancer  phenotype  of  each 
sample  as  the  output.  Consequently,  the  network  output  comes 
to  approximate  the  posterior  Bayesian  probabilities  of  a  sample 
being  either  cancer  or  noncancer  given  its  gene  expression 
profile  (27,  29,  30).  Three  experimental  configurations  were 
tested,  with  either  the  top  40.  80.  and  103  dimensions  (data  set 
selecting  the  top  103  genes;  P  <  0.05)  or  10,  20,  and  30 
dimensions  (data  set  selecting  the  top  30  genes;  P  <  0.02). 
These  top  genes  were  selected  based  on  their  fold  difference 
between  cancer  and  noncancer  and  their  respective  Ps.  Two 
prediction  models  were  built,  one  with  3  hidden  nodes  and  8 
inputs  and  one  with  5  hidden  nodes  and  18  inputs.  Mean- 
squared  error  estimates  were  used  to  explore  network  perform¬ 
ance.  The  “leave-one-out”  method  was  used  for  the  initial  test¬ 
ing  and  training  of  each  neural  network  (27,  29,  30). 

RESULTS 

RNA  Quality  and  Tissue  Architecture  from  Xenograft 
Tissues  Processed  Using  RNA  Later.  Recovery  of  high-qual¬ 
ity  RNA  was  optimal  when  OCT-embedded  tissue  samples  were 
removed  from  frozen  blocks  using  a  small  volume  of  RNA- 
j later,  which  thawed  and  softened  the  embedding  medium  be¬ 
fore  tissue  extraction  from  the  blocks.  Thus,  the  frozen  block 
was  placed  in  a  small  plastic  tray,  with  the  embedded  tissue 
facing  up,  and  500  fxl  of  KNALater  were  pipetted  on  top.  Using 
this  method,  seven  of  eight  samples  yielded  high-quality  RNA 
(Fig.  1  A;  RNA  integrity  analyzed  using  the  Agilent  2100  bio¬ 
analyzer  and  RNA  6000  LabChip  kit).  In  previous  experiments, 
where  the  OCT  block  was  dissolved  by  vigorous  shaking  in  a 
large  volume  of  PBS  and  tissue  fragments  were  recovered  with 
a  strainer,  only  6  of  12  samples  yielded  fully  intact  RNA  (data 
not  shown). 

Pathology  was  not  interpretable  from  material  frozen  di¬ 
rectly  in  RNA  Later  and  transferred  to  OCT'  without  wash  steps 
(Fig.  2 A).  This  reflects  inadequate  freezing  in  the  cryostat  and 
consequent  tissue  folding  during  the  cutting  process.  When 
washed  in  PBS:RNALzter  (1:6)  for  2  h  at  4°C,  the  tissue  did  not 
fold  on  cutting,  but  cell  outlines  appeared  blurred,  making 
pathological  interpretation  difficult  (Fig.  2 B).  Washing  in  PBS: 
RNAZa/er  (1:6)  for  5  min  at4°C  also  eliminated  tissue  folding, 
but  now  the  cell  outlines  appeared  distinct.  Nonetheless,  tissue 
fragmentation  occurred  in  some  specimens,  making  pathological 
interpretation  suboptimal  (Fig.  2C).  Optimal  preservation  of 
tissue  architecture  was  obtained  by  washing  tissue  for  5  min  on 
ice  with  ice-cold  PBS.  Cell  margins  were  clear,  tissue  folding 
and  fragmentation  were  not  observed,  and  the  integrity  of  the 
cores  was  maintained,  allowing  optimal  pathological  interpreta¬ 
tion  (Fig.  2D).  Similar  data  were  obtained  from  a  human  breast 
core  biopsy  released  to  this  study  (Fig.  3  A—D).  A  scheme  of  the 
optimized  tissue  acquisition  protocol  is  shown  in  Fig.  4. 
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Fig.  .5  Structure  of  the  gene  expression  data  using  the  top  103  genes.  A , 
projection  from  top  103  genes  (103  dimensions),  selected  by  t  test, 
comparisons  of  log10-transformed  gene  expression  data  (P  2*  0.05)  and 
projected  into  three  dimensions;  B ,  hierarchical  clustering  of  samples  in 
2  dimensions  based  on  1 -Pearson's  coefficient  matrix  of  the  top  103 
genes.  Snap-frozen  tissue:  O,  noncancer;  A,  cancer.  RNALa/er- 
processed  tissue:  •,  noncancer;  A,  cancer. 


Recovery  of  High-quality  RNA  from  Human  Breast 
Biopsies  for  Gene  Microarray  Studies.  Cores  were  removed 
from  OCT  by  placing  the  frozen  block  in  a  small  plastic  tray,  with 
the  embedded  tissue  facing  up,  and  pipetting  500  fxl  of  RNALater 
on  top.  Intact  cores  were  easily  picked  out  of  the  OCT,  which 
remained  semisolid,  using  a  sterile  pipette  tip.  From  a  study  of  55 
breast  needle  biopsies,  we  obtained  >  100  ng  of  RNA  on  almost  all 
samples  (Table  2).  The  median  value  (1.34  pig)  shows  that  most 
biopsies  produce  sufficient  RNA  to  generate  data  using  500  ng  of 
total  RNA.  There  was  no  significant  difference  between  frozen  and 
RNA7.«fer-processed  biopsies  in  the  mean  concentrations  of  total 
RNA  recovered  (Tables  2  and  3).  Thus,  prospectively  collected 
breast  needle  biopsies,  either  directly  snap-frozen  or  processed  in 
KHALalet\  can  produce  adequate  RNA  concentrations  for  use  in 
gene  microarray  studies. 

A  further  requirement  of  gene  expression  microarray  ex¬ 
periments  is  the  isolation  of  high-quality  RNA  (11).  Sufficient 
RNA  was  not  recovered  to  allow  for  an  assessment  of  RNA 
quality  by  both  standard  gel  electrophoresis  methods  and  gene 
microarray  studies  on  the  same  samples.  Because  gel  electro¬ 
phoresis  requires  ~1  pig  of  RNA,  the  Agilent  2100  “lab-on-a- 
cliip”  technology  was  used  to  assess  RNA  quality.  This  tech¬ 
nology  requires  only  100  ng  of  RNA  to  determine  quality,  with 
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Tabic  4  Genes  comprising  the  30-dimensional  data  set 
A.  Genes  that  appear  up-regulated  in  cancer  tissue 


P 


Gene 

C:N  C° 

t  test* 

Unequal 

t  test  log,  0 

Wilcoxon 

BTF-2 

3.2 

0.010 

0.026 

0.001 

0.008 

pi  60 

3.1 

0.016 

0.030 

0.027 

0.095 

spr2 

3.0 

0.006 

0.013 

0.007 

0.008 

Interferon  inducible  9-27 

2.9 

0.007 

0.019 

0.003 

0.008 

Human  surface  antigen 

2.7 

0.018 

0.035 

0.023 

0.056 

Grbl4 

2.6 

0.004 

0.011 

0.001 

0.008 

gp250  precursor 

2.6 

0.008 

0.009 

0.002 

0.016 

TAXI -binding  protein 

2.4 

0.014 

0.010 

0.002 

0.008 

Myosin-binding  protein  H 

2.3 

0.016 

0.022 

0.011 

0.032 

RP3 

2.3 

0.006 

0.008 

0.004 

0.016 

a-Catenin 

2.3 

0.001 

0.004 

0.001 

0.008 

T3  receptor  cofactor-] 

2.3 

0.005 

0.006 

0.015 

0.016 

DAP-3 

2.2 

0.017 

0.021 

0.025 

0.032 

Selenoprotein-W  (selW) 

2.1 

0.015 

0.023 

0.018 

0.056 

Ferroxidase 

2.1 

0.007 

0.011 

0.008 

0.016 

Cytochrome  c 

2.1 

0.017 

0.017 

0.044 

0.032 

RAN-BP8 

2.1 

0.017 

0.018 

0.016 

0.032 

Aspartate  aminotransferase-] 

1.9 

0.010 

0.011 

0.009 

0.032 

unc-]8  homologue 

1.9 

0.007 

0.010 

0.005 

0.008 

Ph  osph  ethan  olam  i n  e  cytidylyltransf erase 

1.9 

0.007 

0.007 

0.001 

0.016 

Frezzled  (fre) 

1.9 

0.016 

0.024 

0.014 

0.016 

Interferon  a-induced  77.5  kDa 

1.8 

0.017 

0.019 

0.017 

0.032 

Ubiquitin  activating  enzyme  El 

1.8 

0.015 

0.015 

0.011 

0.032 

Macrophage-stimulating  7 

1.8 

0.014 

0.014 

0.022 

0.032 

Ah  receptor 

1.8 

0.002 

0.004 

0.002 

0.008 

B.  Genes  that  appear  up-regulated  in  noncancer  tissues 

P 


Gene 

C:NC° 

t  test* 

Unequal 

t  test  log10 

Wilcoxon 

Neurofibromin  2 

0.6 

0.005 

0.005 

0.006 

0.008 

Frizzled-relaled  protein 

0.5 

0.016 

0.025 

0.015 

0.031 

Type  II  keratin 

0.4 

0.006 

0.007 

0.027 

0.008 

CAGH4 

0.4 

0.004 

0.006 

0.003 

0.008 

Dihydroguanosi n c  triph osphatase 

0.3 

0.014 

0.033 

0.002 

0.008 

a  C:NC,  ratio  of  expression  level  in  cancer  versus  noncancer.  Genes  were  selected  on  the  basis  of  C:NC  S:  1.8  (approximately  2-fold);  P  =£  0.02 
(estimated  to  three  significant  figures)  in  Student’s  t  test. 

*/  tests  used  are:  t  test,  Student’s  (untransformed  data);  Unequal,  unequal  variance  (untransformed  data);  t  test  log,0.  Student’s  t  test  on 
logI0-!ransfonned  data;  Wilcoxon,  Wilcoxon  rank-sum  test  (nonparametric). 


specificity  comparable  with  or  better  than  that  obtained  from 
standard  gel  electrophoresis.  Consequently.  RNA  quality  can  be 
assessed  on  samples  that  will  later  be  subjected  to  gene  microar¬ 
ray  analysis.  As  is  evident  from  Fig.  1 B,  >90%  of  representative 
biopsies  produced  high-quality  RNA. 

Analysis  of  Core  Needle  Breast  Biopsies  and  Visualiza¬ 
tion  of  Gene  Expression  Data.  To  assess  the  applicability  of 
the  tissue  processing  procedure,  we  obtained  total  RNA  from 
five  random  breast  cancer  biopsies  and  five  random  biopsies  of 
noncancer  tissue  (Table  3).  All  tissues  were  evaluated  by  the 
study  pathologist  before  release  for  our  studies  to  ensure  that  the 
investigational  cores  contained  no  diagnostically  useful  infor¬ 
mation.  Both  biopsies  processed  in  KN  Abater  and  biopsies 
frozen  without  KNALater  were  analyzed.  These  biopsies  were 
approximately  equally  represented  in  each  group  (RNALater 
processed:  cancer  =  3;  noncancer  =  3).  RNA  was  prepared,  and 
probes  were  generated  as  described  above.  The  mean  RNA 
concentrations  recovered  by  both  methods  were  comparable 


(see  also  Table  2).  Probes  were  hybridized  to  NamedGene 
filters,  and  signal  was  measured  using  a  Molecular  Dynamics 
Storm  Phosphorlmager.  Digitized  representations  of  the  hybrid¬ 
ized  filter  signals  were  imported  into  the  Pathways  software  for 
background  correction  and  normalization. 

Normalized  gene  expression  data  were  imported  into  the 
visualization  algorithm,  and  scatter  plots  of  the  gene  expression 
data  were  generated.  We  first  reduced  dimensionality  by  elim¬ 
inating  noninformative  genes.  Hence,  we  excluded  those  genes 
whose  expression  was  not  likely  to  be  different  between  the 
cancer  and  noncancer  groups  (multiple  t  tests,  P  >  0.05).  A  total 
of  103  genes  met  this  criterion  and  were  used  to  generate  a 
three-dimensional  (from  103-dimensional)  plot  of  the  data  (Fig. 
5,4).  The  three  axes  are  the  first  three  principal  components 
fitted  to  the  cancer  and  noncancer  molecular  profile  data.  The 
cumulative  proportion  of  the  variance  captured  by  each  princi¬ 
pal  component  axis  is:  (fl)  principal  component  axis  1,  55%;  ( b ) 
principal  component  axis  2,  72%;  and  ( c )  principal  component 
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axis  3,  79%.  We  also  applied  hierarchical  clustering,  similar  to 
approaches  used  by  others  (26),  based  on  Euclidean  space 
analysis  (1 -Pearson’s  correlation  coefficient  matrix).  The  latter 
approach  could  not  completely  separate  two  cancers  from  the 
clusters  of  noncancers  (Fig.  SB).  PCA6-based  multidimensional 
scaling  visualization  separated  breast  cancers  ( triangles )  and 
noncancer  tissue  ( circles )  into  linearly  separable  gene  expres¬ 
sion  data  space.  However,  it  should  be  noted  that  neither  ap¬ 
proach  provides  a  statistical  assessment  of  separability,  only  a 
visualization  of  data  structure.  Whereas  the  number  of  data 
points  is  limited,  the  multidimensional  scaling  visualization  is 
consistent  with  our  ability  to  identify  a  putative  molecular 
profile  that  can  separate  neoplastic  from  nonneoplastic  tissues. 

This  subset  of  genes  is  expected  to  include  some  false 
positives,  reflecting  the  type  1  error  associated  with  the  selec¬ 
tion.  Consequently,  data  dimensionality  was  further  reduced 
using  more  conservative  criteria  (P  <  0.02  and  regulation  S  1.8- 
fold).  We  chose  this  fold  regulation  to  include  all  <2-fold 
differences  in  mean  gene  expression  levels  between  cancer  and 
noncancer  tissues.  The  analysis  produced  a  30-dimensional  data 
set;  25  signals  (genes)  were  up-regulated  in  the  neoplastic 
biopsies  (Table  44),  and  5  signals  were  up-regulated  in  the 
nonneoplastic  biopsies  (Table  4 B).  The  ability  of  this  subset  to 
separate  cancer  from  noncancer  was  also  evaluated  using  both 
our  PCA-based  multidimensional  scaling  approach  and  simple 
hierarchical  clustering.  The  cumulative  proportion  of  the  vari¬ 
ance  captured  by  each  principal  component  axis  is:  ( a )  principal 
component  axis  1,  64%;  (b)  principal  component  axis  2,  75%; 
and  (c)  principal  component  axis  3,  82%.  Neoplastic  and  non¬ 
neoplastic  tissues  (Table  3)  were  now  linearly  separable  in  gene 
expression  data  space  by  both  visualization  methodsTFig.  6,  A 
and  B). 

Neural  Network  Predictors  of  Biopsy  Phenotypes. 
Having  reduced  the  dimensionality,  it  was  necessary  to  assess 
whether  the  expression  patterns  of  remaining  genes  in  the  103 
and  30  dimensions  contained  useful  discriminatory  information. 
'Thus,  the  ability  of  various  gene  subsets  to  train  accurate  neural 
network  predictors  that  could  predominantly  separate  cancer 
from  noncancer  tissues  was  assessed.  The  three  configurations 
tested  (1-3  hidden  nodes)  for  genes  within  the  30-  and  103- 
dimensional  data  sets  are  described  in  “Materials  and  Methods.” 
All  were  evaluated  using  the  leave-one-out  method.  Whereas  the 
number  of  microarrays  from  which  the  data  are  obtained  is  small 
(n  =  10),  each  configuration  achieved  a  0%  misclassification 
rate  (network  training)  for  cancer  versus  noncancer,  whether  in 
103  or  30  dimensions  and  with  either  log10  or  non  transformed 
gene  expression  data. 

Because  the  initial  training  and  testing  were  done  on  the 
original  data  set  from  the  Georgetown  University  samples,  we 
tested  the  neural  networks  against  an  independent  data  set  of 
nine  frozen  breast  specimens  from  the  University  of  Edinburgh. 
These  were  snap- frozen  mastectomy  specimens  rather  than  core 
needle  breast  biopsies,  but  they  should  contain  a  mixture  of 
cancer  and  noncancer  cells  and  provide  a  strong  and  independ- 


6  The  abbreviations  used  are:  PCA,  principal  component  analysis;  ER, 
estrogen  receptor. 
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Fig.  6  Structure  of  the  gene  expression  data  using  the  top  30  genes.  A , 
projection  from  top  30  genes  (30  dimensions),  selected  by  /  test  com¬ 
parisons  of  log10-transformed  gene  expression  data  (P  =£  0.02;  fe  1.8- 
fold  difference)  and  projected  into  three  dimensions;  B ,  hierarchical 
clustering  of  samples  in  2  dimensions  based  on  1 -Pearson's  coefficient 
matrix  of  the  top  30  genes.  Snap-frozen  tissue:  O,  noncancer;  A,  cancer; 
RNA/,tf/er-processed  tissue:  •,  noncancer;  A,  cancer. 


ent  challenge  for  the  neural  network.  The  neural  network  model 
should  accurately  predict  as  cancer  any  biopsy  comprising 
>80%  cancer  tissue. 

Gene  expression  data  were  generated  using  the  same  Re¬ 
search  Genetics  filter  technology  and  queried  in  the  predictive 
model.  For  botli  the  103  and  30  gene  data  sets,  the  nontrans- 
formed  data  provided  the  more  accurate  models.  Both  models 
predicted  that  all  nine  samples  should  be  cancer  and  not  non¬ 
cancer.  The  pathologist  who  evaluated  the  samples  for  the 
training  set  subsequently  performed  histopathological  analysis 
of  stored  samples  of  these  tissues.  All  nine  samples  were  con¬ 
firmed  as  >80%  cancer  specimens.  Thus,  no  samples  in  the 
independent  test  data  set  were  misclassified,  demonstrating  the 
neural  network’s  predictive  accuracy.  When  the  log]0  data  were 
used,  the  models  misclassified  1  of  9  tumors  (30  dimensions; 
89%  accurate)  and  2  of  9  tumors  (103  dimensions;  78%  accu¬ 
rate).  The  lower  classification  rate  with  the  103  genes  probably 
reflects  the  increased  type  1  error  associated  with  this  data  set 
and  the  failure  to  exclude  some  uninformative  genes. 

Genes  Differentially  Expressed  between  Breast  Cancer 
and  Noncancer  Tissues.  The  data  in  Table  4  show  that  the 
choice  of  t  test  has  only  a  marginal  effect  on  data  selection  for 
supervised  dimension  reduction.  If  we  make  no  assumption 
regarding  distribution  of  the  data,  approximately  1  in  3  genes 
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Table  5  Function  of  selected  genes 

Gene  name(s) 

UniGene 

no" 

Function 

Ref.  no. 

BTF2  (butyrophilin) 

Hs.  167741 

Glycoprotein  component  of  human  milk  fat  globule  membranes; 

35  and  41 

grbl4  (growth  factor  receptor-bound 

Hs.  83070 

membrane-associated  receptor  for  association  of  cytoplasmic 
droplets  with  the  apical  plasma  membrane 

Member  of  the  grb7  family;  phosphorylated  by  a  PDGF- 

37 

protein  14) 

TABl;  TAK1  (MAPKKK;  TGF0-activated 

Hs.31472 

regulated  serine  kinase;  expression  conelat.es  with  ER 
expression 

Stimulates  NFkB  activation;  implicated  in  signaling  in  response 

42  and  43 

kinase-1)  binding  protein- 1 

a-Catcnin 

Hs.  178452 

to  TGF-p  and  TNF-a;  activates  plasminogen  activator 
inhibitor  1 

Cell  adhesion  molecule;  binds  E-cadherin;  associated  with 

38  and  44 

DAPS  (death-associated  protein-3) 

Ils.  159627 

tumor  grade  and  ER  expression 

Proapoptotic,  nucleotide-binding  protein 

45  and  46 

Ceruloplasmin;  ferroxidase 

I  Is. 2 8 896 

Copper  transport  protein;  present  in  breast  milk;  serum  levels 

33,  34,  and  47 

Ah  receptor  (aryl  hydrocarbon  receptor) 

13  s.  172287 

are  elevated  in  breast  cancer  patients  with  progressive  disease 
but  not  in  patients  in  remission  or  those  with  benign  breast 
lesions 

Binds  environmental  toxins;  interacts  with  ER;  can  block  the 

36,  48,  and  49 

NF2  (neurofibromalosis-2) 

Hs.902 

transcriptional  activity  of  ER;  binds  ER  co-regulators 

ERAP140  and  SMRT 

Tumor  suppressor;  rarely  mutated  in  breast  cancer 

50  and  51 

Frizzled-r elated  gene 

Ms.  7306 

Secreted  protein;  lost  in  —80%  of  breast  cancers;  apoptosis 

52  and  53 

related  gene,  induced  by  Adriamycin 

"  The  UniGene  databases  can  be  found  at  http://www.ncbi.nhTi.nih.gov/UniGene/Hs.Home.html. 

h  PDGF,  platelet-derived  growth  factor:  NFkB,  nuclear  factor  kB:  TGF,  transforming  growth  factor;  TNF,  tumor  necrosis  factor. 


would  be  rejected  by  relying  solely  on  the  nonparametric  anal¬ 
yses,  a  >  1.8-fold  differential  expression,  and  a  cutoff  of  P  ^ 
0.02.  The  30  target  cDNAs  comprising  the  30-dimensional  data 
set  are  presented  in  Table  4. 

DISCUSSION 

Generally,  prospective  study  designs  are  more  valid  for  the 
exploration  or  validation  of  new  predictive  and  prognostic  fac¬ 
tors.  Retrospective  breast  cancer  studies  may  be  compromised 
by  the  bias  toward  larger  tumors  in  many  existing  frozen  tumor 
banks,  whereas  the  average  size  of  most  newly  diagnosed  breast 
tumors  continues  to  decrease  (9).  Thus,  many  studies  into  the 
molecular  biology  of  such  early  lesions  may  need  to  be  done 
prospectively.  Investigators  at  single  academic  institutions  can 
often  prospectively  obtain  frozen  samples  under  a  rigorous 
collection  protocol.  However,  the  ability  to  do  so  at  multiple 
institutions  or  when  local  clinics  and  community  physicians  are 
also  involved  can  be  problematic.  A  rapid,  standard  tissue 
processing  approach  should  allow  for  the  use  of  tissues  from 
multiple  institutions  in  a  controlled  manner.  For  example,  it 
should  be  possible  to  reduce  possible  changes  in  molecular 
profiles  associated  with  differences  in  tissue  acquisition  and 
processing.  Whereas  these  concerns  have  not  been  explored  in 
detail  experimentally,  tissue  processing  clearly  affects  the  per¬ 
formance  of  other  molecular  biological  technologies  applied  to 
human  biopsies  and  tumor  tissues  (10). 

To  address  these  issues,  we  conducted  studies  to  identify 
an  optimal  tissue  acquisition,  processing,  and  analysis  pro¬ 
cedure  for  exploring  the  gene  expression  profiles  of  prospec¬ 
tively  accrued  breast  core  needle  biopsies.  Because  RNA 
extraction  destroys  tissue  architecture,  we  developed  a  novel 
method  for  tissue  processing  that  would  allow  us  to  obtain 


samples  in  a  uniform  manner,  preserve  RNA  quality/quantity, 
and,  most  importantly,  retain  all  potentially  diagnostically 
relevant  information. 

Tissue  placed  in  RNA Later  can  be  left  at  room  temperature 
for  up  to  1  h  at  37°C,  1  week  at  25°C,  and  >  1  month  at  4°C  and 
retain  fully  intact  RNA  (31).  Our  data  show  that  biopsies  pro¬ 
cessed  immediately  in  either  liquid  nitrogen  or  RNATrt/er  can 
produce  sufficient  concentrations  of  high-quality  RNA  for  nylon 
filter  microarray  analysis  without  RNA  amplification.  This 
amount  of  RNA  is  also  adequate  for  amplification  for  use  with 
other  gene  expression  microarray  technologies  (32).  If  pro¬ 
cessed  carefully,  tissue  architecture  can  be  maintained  from 
biopsies  collected  in  RNA Later.  This  is  clearly  important  be¬ 
cause  some  small  breast  lesions  can  be  completely  removed  by 
the  biopsy  procedure.  These  core  biopsies  should  not  be  used  for 
studies  if  critical  diagnostic  information  could  be  lost.  We 
estimate  that,  using  the  approaches  described  in  this  study, 
approximately  90%  of  suitable  core  needle  breast  biopsies 
should  produce  sufficient  material  for  gene  expression  microar¬ 
ray  studies. 

Our  studies  demonstrate  that  the  RNA  recovered  can  be 
used  to  generate  relevant  gene  expression  microarray  informa¬ 
tion.  Relevance  is  evident  from  our  abilities  to  identify  differ¬ 
entially  expressed  genes  associated  with  breast  cancer  cells  and 
to  build  accurate  neural  network  predictors  that  can  identify 
cancer  from  noncancer  samples  based  solely  on  their  gene 
expression  profiles. 

Among  the  differentially  expressed  genes  in  the  reduced 
30-dimensional  space,  we  would  expect  lo  find  either  some 
genes  already  implicated  ill  breast  cancer  or  known  to  be  ex¬ 
pressed  in  normal  or  neoplastic  breast  tissues.  Consistent  with 
this  expectation,  several  genes  of  potential  relevance  were  iden- 
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tified.  For  example,  ceruloplasmin  is  up-regulated  in  neoplastic 
breast  tissues,  and  elevated  serum  levels  of  ceruloplasmin  are 
associated  with  recurrent  breast  cancers  (33,  34).  The  BT2 
glycoprotein  is  a  milk  protein  (35)  and  might  be  expected  to  be 
expressed  in  tissues  predominately  composed  of  breast  epithe¬ 
lial  cells. 

ER  protein  status  is  determined  routinely  for  cancer  but  not 
noncancer  biopsies.  Because  four  of  the  five  tumor  biopsies 
were  ER  positive,  we  also  would  expect  to  find  genes  with 
expression  patterns  known  either  to  be  associated  with  ER  or  to 
modulate  ER  function.  At  least  three  genes  meet  these  criteria. 
The  aryl  hydrocarbon  receptor  is  known  to  interact  with  ER  and 
affect  its  function  (36),  and  the  expression  of  both  grbl4  and 
(x-catenin  is  associated  with  ER  expression  in  breast  tumors 
(Refs.  37  and  38;  Table  5). 

The  discriminant  power  of  the  genes  selected  is  evident 
from  the  accuracy  of  the  neural  networks  built  using  the  data 
from  the  initial  five  cancer  and  five  noncancer  biopsies.  The 
ability  to  accurately  identify  independent  samples  as  cancer 
shows  that  the  genes  of  interest  are  expressed  or  repressed  in 
both  patterns  and  at  levels  consistent  with  the  model.  This  is  an 
appropriate  and  rigorous  test  of  the  approach  because  the  goal 
was  to  build  molecular  predictors,  rather  than  to  identify  func¬ 
tionally  relevant  genes.  Building  a  predictor  is  also  a  much  more 
efficient  test  of  the  selected  genes  than  would  be  obtained  by 
simply  confirming  expression  gene  by  gene  in  more  standard 
assays:  Northern  blot,  RNase  protection,  or  real-time  PCR. 
Confirming  the  differential  expression  of  each  gene  is  unneces¬ 
sary  for  building  clinically  relevant  predictive  models.  Unlike 
studies  to  identify  functionally  relevant  genes,  the  discriminate 
power  of  each  signal  from  the  target  cDNAs  on  the  array  is 
independent  of  whether  that  signal  originates  from  hybridization 
to  its  expected  mRNA. 

The  gene  expression  profile  data  and  neural  network  per¬ 
formance  suggest  that,  at  least  for  samples  of  very  different 
biologies,  contamination  of  samples  with  >80%  of  other  cell 
types  may  not  confound  analyses  for  molecular  profiling. 
Whether  this  observation  can  be  extrapolated  to  other  studies 
remains  to  be  further  established.  Nonetheless,  the  resource 
intensive  requirements  of  microdissection  and  RNA  amplifica¬ 
tion  may  not  be  absolute  requirements  for  all  molecular  profil¬ 
ing  studies. 

The  tissue  acquisition  and  processing  methods,  dimension 
reduction,  data  visualization  approaches,  and  neural  network 
analyses  we  describe  may  be  useful  in  the  design  of  larger 
prospective  studies.  We  continue  to  develop  other  data  visual¬ 
ization.  normalization,  and  exploration  algorithms  that  also  may 
be  of  use  in  the  analysis  of  gene  expression  microarray  studies 
(24,  25,  39,  40). 
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SUMMARY 

We  propose  a  block  principal  component  analysis  method  for  extracting  information  from  a  database 
with  a  large  number  of  variables  and  a  relatively  small  number  of  subjects,  such  as  a  microarray  gene 
expression  database.  This  new  procedure  has  the  advantage  of  computational  simplicity,  and  theory  and 
numerical  results  demonstrate  it  to  be  as  efficient  as  the  ordinary  principal  component  analysis  when 
used  for  dimension  reduction,  variable  selection  and  data  visualization  and  classification.  The  method  is 
illustrated  with  the  well-known  National  Cancer  Institute  database  of  60  human  cancer  cell  lines  data 
(NCI60)  of  gene  microarray  expressions,  in  the  context  of  classification  of  cancer  cell  lines.  Copyright 
©  2002  John  Wiley  &  Sons,  Ltd. 

KEY  WORDS:  principal  component  analysis;  grouping  of  variables;  similarity;  gene  expression; 
microarray  data  analysis 


1.  INTRODUCTION 

Principal  component  analysis  is  one  of  the  most  common  techniques  of  exploratory  mul¬ 
tivariate  data  analysis.  It  is  a  method  of  transforming  a  set  of  p  correlated  variables  x  = 
(xi,JC2,...,*p)  to  a  set  of  p  uncorrelated  variables  y  =  (vi,  V2,.*, yp)  that  are  linear  func¬ 
tions  of  the  x’s,  referred  to  as  p  principal  components  of  x,  such  that  the  variances  of  the 
y s  are  in  descending  order  with  respect  to  the  variation  among  the  x’s.  Usually  the  first 
several  components  explain  most  of  the  variation  among  the  x’s .  In  addition  to  many  other 
applications,  principal  component  analysis  has  been  shown  to  be  a  useful  tool  in  reducing 
data  dimension  and  extracting  information,  in  seeking  important  regressors  in  regression  anal¬ 
ysis,  and  in  effectively  visualizing  and  clustering  subjects,  when  measurements  on  a  large 
number  of  variables  are  collected  from  each  subject.  The  book  by  Joiliffe  fl]  provides  excel¬ 
lent  reading  on  this  topic,  although  other  textbooks  on  multivariate  data  analysis  do  also  (for 
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example.,  references  [2]  and  [3]).  Recently,  principal  component  analysis  has  found  application 
in  the  analysis  of  microarray  gene  expressions  [4],  a  growing  technology  in  human  genome 
studies  [5, 6]. 

When  dealing  with  an  extremely  large  number  of  variables  (for  example,  500  or  more), 
deriving  principal  components  can  be  computationally  intensive,  since  it  involves  finding  the 
eigenvectors  (and  eigenvalues)  of  a  matrix  with  large  dimensions.  Moreover,  a  linear  com¬ 
bination  of  such  a  large  number  of  variables  becomes  less  meaningful  to  the  investigators 
since  the  high  dimensionality  makes  it  hard  to  extract  useful  information  and  to  interpret 
the  combination.  In  one  microarray  technology,  cDNA  clone  inserts  are  printed  onto  a  glass 
slide  and  then  hybridized  to  two  differentially  fluorescent!}'  labelled  probes.  The  final  gene 
expression  profile  contains  fluorescent  intensities  and  ratio  information  of  many  hundreds  or 
thousands  of  genes.  If  one  intends  to  apply  principal  component  analysis  directly  to  extract 
gene  expression  information  for  these  genes  from  a  certain  group  of  subjects,  then  one  has 
to  deal  with  a  matrix  with  huge  dimensions. 

In  dealing  with  such  high  dimensional  data,  we  propose  to  perform  the  principal  component 
analysis  in  a  ‘ stratified ’  way.  We  first  group  the  original  variables  into  several  ‘ blocks'  of 
variables,  in  the  sense  that  each  block  contains  variables  (genes  in  the  microarray  experiments) 
that  are  similar;  variables  from  the  same  block  are  more  correlated  than  variables  from  different 
blocks.  We  then  perform  principal  component  analysis  within  each  block  and  obtain  a  small 
number  of  variance-dominating  principal  components.  Combining  these  principal  components 
obtained  from  each  block  forms  a  new  database  from  which  we  can  then  extract  information 
by  performing  a  new  principal  component  analysis.  We  term  this  procedure  as  ‘block  principal 
component  analysis’ .  Dominating  principal  components  obtained  from  the  final  stage  can  then 
be  used  in  various  data  exploratory  analyses  such  as  clustering  and  visualization. 

The  proposed  ‘block  principal  component  analysis'  method  also  enables  us  to  reduce  the 
number  of  variables  effectively.  Within  each  block,  when  principal  component  analysis  is 
conducted  and  dominating  linear  combinations  of  variables  are  examined,  only  those  variables 
that  have  relatively  large  coefficients  are  retained.  We  will  examine  this  variable  selection 
procedure  in  detail  using  the  gene  microarray  example. 

After  a  brief  review  of  the  mathematical  derivation  of  principal  components  and  their 
applications  in  Section  2,  we  introduce  in  Section  3  the  method  of  ‘block  principal  component 
analysis’.  In  Section  4,  we  investigate  the  efficiency  of  block  principal  components  in  the 
reduction  of  data  dimension  with  respect  to  the  amount  of  variance  explained.  It  is  shown  that 
the  proposed  procedure  can  be  as  efficient  as  the  ordinary  principal  component  analysis.  We 
then  discuss  the  selection  of  informative  variables  using  block  principal  component  analysis. 
In  Section  5  we  apply  the  method  to  the  problem  of  classification  of  microarray  data  from 
the  well-known  National  Cancer  Institute  database  of  60  human  cancer  cell  lines  (NCI60), 
each  of  which  has  gene  microarray  expression  of  more  than  1000  genes  [7].  Some  discussion 
is  given  in  Section  6. 


2.  PRINCIPAL  COMPONENTS 

We  start  with  a  brief  mathematical  derivation  of  principal  components.  More  details  can  be 
found  in  references  [1]  or  [2]  and  [3].  Tliroughout,  vectors  are  viewed  as  column  vectors,  and 
A1  is  the  transpose  of  a  matrix  A. 
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Consider  a  p-v ariate  random  vector  X  with  mean  vector  p  and  positive  definite  covariance 
matrix  2.  Let  X]  ^  X2  >  ■  ■  -Xp(  >  0)  be  the  eigenvalues  of  2  and  let  W  =  (Wj , . . . ,  Yfp)  be  a  px  p 
orthogonal  matrix  such  that 


W/2W  =  A  =  diag(A]j...,4)  (1) 

so  that  w’i  is  an  eigenvector  of  2  corresponding  to  the  eigenvalue  Xt.  Now  put  U  =  W'X  = 
(t/i,...,  Up)';  then  cov(U)  =  A,  so  that  U\,...9UP  are  all  uncorrelated,  and  var(t/,)  =  4< 
i  =  l,...,p.  The  linear  components  U\9...,UP  are  called  principal  components  of  X.  The  first 
principal  component  is  U\  =w)/X  and  its  variance  is  X\\  the  second  principal  component  is 
JJ2  =  WjX  with  variance  X2\  and  so  on.  These  p  principal  components  have  the  following  key 
property.  The  first  principal  component  U\  is  the  nonnalized  (unit  length)  linear  combination 
of  the  components  of  X  with  the  largest  variance,  and  its  maximum  variance  is  X\ ;  then  out 
of  all  nonnalized  linear  combinations  of  the  components  of  X  which  are  uncorrelated  with  U\9 
the  second  principal  component  U2  has  maximum  variance  X2.  In  general,  the  /cth  principal 
component  £/*  has  maxinnun  variance  Xu,  among  all  nonnalized  linear  combinations  of  the 
components  of  X  which  are  uncorrelated  with  t4-i- 

Very  often  these  principal  components  are  refened  to  as  population  principal  components. 
In  practice  2  is  not  known  and  has  to  be  estimated  from  the  sample,  yielding  the  sample 
principal  components.  We  do  not  distinguish  these  two  definitions  here. 

Once  the  p  principal  components  are  derived,  then  we  can  conduct  various  statistical  anal¬ 
yses  using  only  the  first  q(<  p)  principal  components  which  account  for  most  of  the  variance 
of  X.  For  example,  we  can  plot  the  first  two  (three)  principal  components  in  a  two-  (three-) 
dimensional  space  to  seek  interesting  patterns  among  the  data,  or  perform  clustering  analysis 
on  subjects  in  order  to  search  for  clusters  among  the  data.  We  can  also  use  these  leading 
principal  components  as  regressors  in  a  regression  analysis  to  find  prognostic  factors  for  clin¬ 
ical  outcomes  (for  example,  drug  response  or  resistance).  See  reference  [1]  for  various  other 
applications  of  principal  component  analysis. 

Derivation  of  principal  components  involves  computation  of  eigenvalues  and  eigenvectors 
of  die  px p  matrix  2  (or  its  sample  estimate).  When  p  is  very  large,  the  computation  will 
become  extremely  extensive.  Moreover,  it  is  always  the  interest  of  die  investigators  to  examine 
the  first  several  leading  principal  components  in  order  to  find  useful  information.  With  a  linear 
combination  of  a  large  number  of  variables,  this  becomes  extremely  difficult  and  results  are 
hard  to  interpret.  To  deal  with  these  problems,  we  develop  the  ‘block  principal  component 
analysis'  method  to  be  discussed  in  the  following  sections. 


3.  BLOCK  PRINCIPAL  COMPONENT  ANALYSIS 

Ordinary  principal  component  analysis  needs  to  find  an  orthogonal  matrix  W  such  that  W'EW 
is  diagonal.  In  a  very  extreme  case  when  all  of  the  components  of  X  are  independent,  the  p 
principal  components  are  the  p  components  of  X,  and  W  is  merely  some  permutation  of  the 
identity  matrix,  rearranging  the  components  of  X  according  to  their  variances.  If  the  random 
vector  X  can  be  partitioned  into  k  uncorrelated  random  subvectors,  so  that  2  has  diagonal 
blocks,  then  performing  principal  component  analysis  with  X  is  equivalent  to  performing  prin¬ 
cipal  component  analysis  with  each  subvector  and  then  combining  all  the  principal  components 
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from  all  subvectors.  This  simple  fact  leads  to  the  consideration  of  ‘block  principal  component 
analysis’  even  when  X  does  not  have  uncorrelated  partitions. 

Let  X  be  partitioned  as  X  =  (X{,...,XjL)'  with  X,  being  /?, -dimensional,  where  p\  +  ■  ■  ■  + 
pk  =  p,  and  E  be  partitioned  accordingly  as 


/  £11  £12  •  •  •  Su  \ 


v  _ 


r 

**k\ 


~>k2 


£** ) 


(2) 


Let  W,=  (w,i,  •••,'%),  i  =  l,..../c,  be  an  p/X pt  orthogonal  matrix  such  that 

W/LiY\M  =  A,-  =  diagUn , . . . ,  Xip. ),  XiPt  (3) 

so  that  w y,  j=\,...,pi,  is  an  eigenvector  of  Eff  corresponding  to  the  eigenvalue  Xy.  Put 
U,  =  W/X,-  =  ( f/,] , . . . ,  Uip. )',  then  the  pt  components  Uy,  j=l,...,pi,  of  U,-  define  the  p, 
principal  components,  referred  to  as  ‘block’  principal  components,  with  respect  to  the  random 
vector  X„  tire  ith  block  of  variables  of  X. 

Now  define 


Q  =  diag(W,,...,W,) 

also  an  orthogonal  matrix,  and 

Y=Q'X  =  (U1',...,U£y 

a  random  vector  combining  all  ‘block’  principal  components,  then 


/  A, 

Wi'E,2W2  • 

•  •  W/Eu-WA 

cov(7)  =  ft  =  Q'EQ  = 

\  w/L^-i  w,  w;ea.,w2  •  •  •  A,  j 


(4) 

(5) 


(6) 


Note  that  ft  and  E  have  the  same  eigenvalues,  and  in  particular,  tr(ft)  =  tr(E),  where  tr  stands 
for  the  trace  (sum  of  all  diagonal  elements)  of  a  matrix.  Hence  X  and  Y  have  equal  total 
variance.  Let  W  be  defined  as  in  (1),  and 

R  =  Q'W  (7) 

then  R  is  also  an  orthogonal  matrix  and 


R'  cov( T  )R  =  W'EW  =  diag(A,  ,...,XP)  (8) 

that  is,  the  p  principal  components  of  Y  are  identical  to  those  of  X. 

Hence,  we  can  obtain  the  principal  components  of  a  random  vector  X  by  two  steps.  In 
the  first  step,  we  group  the  variables  in  X  into  several  blocks,  and  then  derive  principal 
components  for  each  block  of  variables.  In  the  second  step,  we  define  a  new  random  vector 
Y  by  combining  all  the  ‘block’  principal  components  and  then  obtain  the  principal  components 
of  Y,  which  are  identical  to  the  principal  components  of  X. 
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The  geometrical  interpretation  of  block  principal  component  analysis  is  quite  clear.  The 
^-dimensional  random  vector  X  represents  the  p  axes  in  a  /7-dimensional  space.  The  p 
principal  components  rotate  the  X-space  to  one  whose  axes  are  defined  by  the  p  principal 
components.  In  order  to  rotate  the  original  space  to  its  desired  direction,  we  can  first  group 
the  axis  and  rotate  the  axis  within  each  group  and  then  do  an  overall  rotation  to  achieve  the 
desired  direction. 

From  the  mathematical  derivation  above,  we  notice  that  this  procedure  always  yields  the 
principal  components  of  X,  regardless  of  how  the  blocks  are  defined.  The  choice  of  blocks, 
however,  does  have  effects  on  several  aspects.  First,  if  the  X  can  be  divided  into  uncorrelated 
blocks,  then  the  components  in  Y  are  the  principal  components  of  X,  and  there  is  no  need 
to  orthogonalize  Y.  Second,  even  when  X  cannot  be  partitioned  into  uncorrelated  blocks,  if 
the  off-diagonal  terms  'W/'LjjW)  are  relatively  small,  as  measured,  say,  by  a  matrix  norm  (for 
example,  squared  sum  of  squares  of  all  elements),  then  without  losing  much  information, 
we  can  still  use  the  components  of  Y  as  approximation  to  the  principal  components  of  X. 
Third,  when  reduction  in  dimension  and  in  the  number  of  variables  is  conducted  within  each 
block,  which  will  be  discussed  in  the  next  section,  we  would  expect  that  variables  within  each 
block  are  much  more  correlated  than  variables  from  two  different  blocks,  so  that  selection 
of  dimensions  and  of  variables  from  one  block  will  not  be  much  affected  by  selection  of 
variables  from  another  block.  For  these  reasons,  we  recommend  grouping  the  variables  into 
blocks  according  to  their  correlation.  This  can  be  achieved  by  clustering  the  variables  using 
a  proper  function  of  Pearson's  correlation  coefficient  as  the  measure  of  similarity  between 
variables;  one  such  measure  is  given  in  Section  5  of  the  paper. 

4.  DIMENSION  REDUCTION  AND  VARIABLE  SELECTION 
4.1.  Dimension  reduction 

A  major  application  of  principal  component  analysis  is  to  reduce  data  dimension  so  that 
the  data  structures  can  be  explored  or  even  visualized  in  a  low-dimensional  space.  When 
data  dimension  is  extremely  high,  block  principal  component  analysis  allows  us  to  reduce 
data  dimension  more  effectively  without  losing  much  information.  We  propose  the  following 
procedure  to  achieve  low  dimension.  Suppose  k  blocks,  X,-,  with  dimension  />,  and  covariance 
matrix  L/„  i=\,...,k,  of  the  original  variables  X,  are  determined  according  to  the  correlation 
between  variables.  For  each  block  X,  we  derive  the  /?,  principal  components,  and  retain 
only  the  first  q,  (<  pi)  principal  components,  say,  Uy,  j  =  1, ...  ,q.;.  so  that  the  total  variance 
explained  by  these  qt  principal  components  is  7i, tr  (£,,).  where  0<7i,<l.  Now  define 

Y  =(Uu,...,Uigi,...,Uku...,Ukqty  (9) 

a  variable  combining  all  principal  components  selected  from  each  block.  We  then  obtain  the 
principal  components  of  Y,  and  choose  the  first  /  principal  components,  say  Z\,...,Zf,  which 
explain  a  high  percentage  of  IOOtt  per  cent  (for  example,  n  =  95  per  cent)  of  the  total  variance 
of  Y.  Data  visualization  and  classification  with  the  original  variable  X  is  then  conducted  based 
on  these  /  principal  components. 

These  block  principal  components  preserve  many  optimal  properties  of  the  ordinary  prin¬ 
cipal  components:  (i)  Z\,...,Zr  are  uncorrelated;  and  (ii)  var(Z, )> •  •  •  ^  var(Z/).  However, 
these  variances  are  no  longer  the  eigenvalues  of  L,  the  covariance  matrix  of  the  original 
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variables  X.  Instead,  they  are  the  eigenvalues  of  the  covariance  matrix  of  Y;  (iii)  the  total 
variance  of  Z\,...,Zf  is 


7itr[cov(Y)] 


/=  1 


Evar(C4) 

./=! 


k 


(=i 


which  accounts  for  1007t  per  cent  of  the  total  variance  of  X,  where 

_  jt  E;=  i  ttj  tr (S,-j ) _ u  E,=1  tr(£,-,) 

71 "  tr(X)  E-=,  tr(2«) 


We  hence  have 


n^n  min{7i,}  (10) 

When  using  principal  components  to  explore  (for  example,  cluster,  visualize)  the  data,  we 
expect  that  the  leading  components  explain  most  of  the  variance  so  that  they  will  reveal  the 
true  nature  of  the  data  structure;  (10)  asserts  that  block  principal  components  Z\,...,Z/  will 
retain  most  of  the  variance  if,  within  each  block  and  for  the  final  principal  component  analysis, 
the  selected  principal  components  explain  most  of  the  variance.  For  example,  if  7t,^95  per 
cent,  i  =  \,...,k  and  n> 95  per  cent,  then  n > 90  per  cent. 


4.2.  Variable  selection 

When  the  number  p  of  variables  is  very  large,  many  variables  can  be  highly  correlated  with 
each  other  and  some  may  become  redundant  when  the  rest  are  being  used  to  explore  date 
structure.  For  example,  in  a  gene  microarray  experiment  where  gene  expression  of  a  large 
number  of  genes  is  obtained  for  a  number  of  tissues,  tissue  classification  based  on  all  genes 
may  be  quite  similar  to  that  based  on  a  small  group  of  genes.  If  this  is  the  case,  then 
with  respect  to  tissue  classification,  only  these  genes  are  infonnative  and  the  rest  become 
redundant,  assuming  that  using  all  the  genes  indeed  captures  the  real  structure  of  the  data.  It 
is  therefore  important  to  select  variables  that  contain  almost  all  information,  with  respect  to 
certain  statistical  properties,  that  all  variables  would  contain. 

Block  principal  component  analysis  can  be  used  to  select  these  variables.  We  propose  the 
following  two  steps: 

Step  1.  Divide  the  original  variable  X  into  k  blocks,  X„  ?'  =  1 .....  A:,  according  to  correlation 
between  variables. 

Step  2.  For  each  block  X,-,  conduct  principal  component  analysis  and  select  the  first  q,  leading 
principal  components  such  that  the  total  variance  of  X,-  is  explained  by  a  satisfactory 
amount,  say,  at  least  95  per  cent.  Examine  the  coefficients  (or  loadings  in  many 
principal  component  analysis  literatures)  of  the  variables  in  X,-  in  these  cp  leading 
components  and  retain  only  those  variables  with  large  coefficients.  Combine  all  the 
variables  selected  from  each  block  and  then  use  only  these  variables  for  further 
analysis. 

A  third  step  may  also  be  useful  if  the  number  of  variables  selected  is  still  too  large: 

Step  3.  Conduct  principal  component  analysis  again,  but  based  only  on  the  variables  selected 
in  step  2.  Select  the  first  several  leading  principal  components  to  explain  most  of  the 
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variance.  Then  examine  the  variables  again  and  retain  those  with  large  coefficients  in 
these  leading  combinations. 

There  is  no  universal  criterion  for  how  many  and  which  variables  should  be  selected  from 
the  leading  principal  components.  Jolliffe  [1]  recommended  choosing  a  variable  from  each 
leading  principal  component  with  the  largest  absolute  coefficient,  if  the  variable  has  not  been 
selected  from  previous  leading  components.  In  practice  some  modifications  of  Jolliffe’s  proce¬ 
dure  may  also  be  effective.  For  example,  one  can  choose  several  variables  from  each  leading 
principal  component  with  the  largest  absolute  coefficients.  For  more  discussion,  see  refer¬ 
ence  [1]. 

In  the  next  section,  we  demonstrate  the  block  principal  component  analysis  method  us¬ 
ing  the  well-known  NCI60  human  cancer  cell-line  data  [7]  to  select  a  group  of  genes  to 
visualize/cluster  the  cell  lines.  The  result  shows  such  selection  to  be  quite  effective. 


5.  APPLICATION  TO  GENE  MICROARRAY  ANALYSIS:  AN  EXAMPLE 

The  NCI60  database  contains  expression  of  more  than  9000  genes  of  60  human  cancer  cell 
lines  from  nine  types  of  cancer  including  colorectal,  renal,  ovarian,  breast,  prostate,  lung  and 
central  nervous  system,  as  well  as  leukaemia  and  melanomas.  Gene  expression  levels  are 
expressed  as  -log(ratio),  where  ratio  =  the  red/green  fluorescence  ratio  after  computational 
balancing  of  the  two  channels.  Readers  are  referred  to  reference  [7]  for  more  details.  The 
data  have  been  made  public  for  analysis  on  the  authors’  web  site  http://discover.nci.nih.gov. 
To  get  familiar  with  the  DNA  microarray  technology,  readers  are  referred  to  references  [5] 
and  [6]  for  more  information. 

One  of  the  objectives  of  this  study  is  to  explore  the  relationship  between  gene  profiles 
and  cancer  phenotypes.  Scherf  el  al.  [7]  used  a  clustering  analysis  method  to  study  the 
relationship.  They  provide  the  clustering  tree  of  the  60  cell  lines,  based  on  1376  genes,  and 
showed  that  most  of  the  cell  lines  cluster  together  according  to  their  phenotypes  (see  Figure  2a 
of  reference  [7].)  One  important  question  is  whether  a  smaller  group  of  genes  can  preserve 
the  same  relationship  structure. 

We  use  a  selection  method  based  on  block  principal  component  analysis,  as  described  in 
Section  4,  to  tackle  this  issue.  For  simplicity,  we  study  only  cell  lines  from  three  types 
of  cancer,  colorectal  (7  cell  lines),  leukaemia  (6  cell  lines)  and  renal  (8  cell  lines);  each 
cell  line  has  microarray  expression  of  the  same  1416  genes.  The  data  set  of  interest,  21 
cell  lines  (being  the  subjects)  and  1416  genes  (being  the  variables),  hence  form  a  21x1416 
matrix,  representing  21  data  points  (21  rows  of  the  matrix)  in  a  1416-dimensional  data  space. 
The  complete-linkage  clustering  tree  based  on  these  1416  genes  is  shown  in  Figure  1  ( a ) . 
The  dendrogram  is  consistent  with  that  in  reference  [7],  and  shows  clearly  that  the  21  cell 
lines  cluster  according  to  their  cancer  phenotypes.  The  readers  are  reminded  that  phenotype 
information  is  not  used  in  the  clustering,  but  only  to  validate  the  clustering  results.  One 
renal  cell  line  marked  as  ‘RE8’,  which  is  farther  from  the  rest  of  renal  cell  lines,  has  been 
recognized  to  have  some  special  feature  (see  reference  [7]  for  detail). 

We  now  seek  to  determine  the  blocks  for  the  1416  genes.  Figure  2  shows  a  plot  of  semi- 
partial  R2  versus  the  number  of  clusters  using  complete-linkage  algorithm  and  dy  —  arcos(|p,y|) 
as  a  measure  of  dissimilarity  between  gene  i  and  gene  j,  where  py  is  the  Pearson  correlation 
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Figure  1.  Dendrogram  of  complete  linkage  hierarchical  clustering  of  21  cell  lines:  ( a )  tree  based  on 
1416  genes;  ( b )  tree  based  on  200  genes.  CO  is  colorectal,  LE  is  leukaemia  and  RE  is  renal. 


coefficient.  The  semi-partial  R2  measures  the  loss  of  homogeneity  when  two  clusters  are 
merged.  Define  SSj  as  the  corrected  total  sum  of  squares  of  all  subjects  and  summed  over 
all  variables.  For  a  certain  cluster  C.  let  SS c  be  the  corrected  total  sum  of  squares  of  all 
subjects  in  cluster  C  summed  over  all  variables.  Then  the  semi-partial  R2  for  combining  two 
clusters  Cj  and  C2  into  one  cluster  C  is  (SSc  —  SSc,  —  SS^j/SSj.  A  large  semi-partial  R2 
indicates  significant  decrease  in  homogeneity.  Since  subjects  within  the  same  cluster  should 
be  similar,  two  clusters  should  not  be  combined  as  one  cluster  if  the  semi-partial  R2  is  large. 
In  practice  we  determine  the  number  of  clusters  by  minimizing  the  semi-partial  R2;  a  plot 
of  the  semi-partial  R2  versus  the  number  of  clusters  is  extremely  helpful.  More  discussion 
and  computation  of  semi-partial  R2  can  be  found  in  reference  [8].  Other  statistics  can  also  be 
used  to  determine  the  number  of  clusters  in  the  data.  Milligan  and  Cooper  [9]  examined  30 
procedures  for  determining  the  number  of  clusters,  including  several  variations  based  on  sum 
of  squares. 

For  the  cancer  cell-line  microarrav  data,  the  semi-partial  R2  becomes  nearly  flat  after  14 
clusters.  This  indicates  that  the  1416  genes  can  be  approximately  divided  into  14  clusters; 
further  dividing  the  data  gains  little  in  reducing  heterogeneity.  These  clusters  of  genes  deter¬ 
mine  the  blocks  within  each  of  which  principal  component  analysis  will  be  conducted.  The 
number  of  genes  in  the  blocks  ranges  from  43  to  158  (Table  I).  Principal  component  analysis 
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No .  of  C i us  t  e  r  5 

Figure  2.  Determining  number  of  blocks:  plot  of  semi-partial  R2, 


Table  I.  Summary  of  14  gene  blocks. 


Block 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

Number  of  genes 

107 

154 

88 

158 

68 

152 

84 

136 

44 

84 

143 

82 

73 

43 

Number  of  PCs 

14 

13 

15 

15 

14 

15 

14 

15 

14 

16 

16 

14 

14 

11 

Per  cent  variance 

95.6 

95.3 

95.2 

95.3 

95.2 

96 

95.1 

95.2 

96 

96 

95.1 

96 

95.6 

95.6 

is  conducted  within  each  block,  and  the  first  several  leading  principal  components  are  then 
selected,  resulting  in  a  total  of  200  principal  components.  For  each  block,  selected  principal 
components  explain  >95  per  cent  of  total  variance  in  that  block.  For  each  block,  Table  1 
lists  the  number  of  genes,  the  number  of  selected  principal  components  and  the  percentage  of 
total  variance  explained  by  these  leading  components. 

For  each  block,  genes  with  largest  coefficients  in  the  selected  leading  principal  components 
are  retained,  using  Jollife’s  one  variable  per  leading  component  method.  This  yields  a  total 
of  200  genes  for  further  analysis. 

The  first  three  leading  principal  components,  computed  based  on  the  1416  genes,  explain 
only  49  per  cent  of  the  total  variance.  Two-  or  three-dimensional  visualization  of  the  data 
based  on  these  principal  components  can  be  very  misleading.  We  validate  these  selected  200 
genes  by  deriving  a  hierarchical  clustering  tree  for  the  21  cell  lines  based  on  gene  expressions. 
The  dendrogram  is  shown  in  Figure  1(6).  It  is  remarkably  similar  to  the  one  based  on  all 
1416  genes  (Figure  1(a)).  Both  illustrate  that  cell  lines  with  the  same  phenotype  are  more 
similar  than  those  from  different  phenotypes.  This  shows  that  a  much  smaller  number  of  genes 
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can  provide  the  same  insight  for  the  data  as  the  whole  set  of  genes,  and  block  principal  com¬ 
ponent  analysis  provides  an  effective  way  to  achieve  this.  Note  that  a  hierarchical  clustering 
dendrogram,  obtained  based  on  a  set  of  variables,  is  essentially  the  same  as  the  hierarchical 
clustering  dendrogram  obtained  based  on  leading  principal  components,  provided  that  these 
leading  components  explain  most  of  the  variation  among  the  variables.  The  remarkable  re¬ 
semblance  between  Figure  l(r/)  and  Figure  1(6)  further  demonstrates  the  effectiveness  of  the 
block  principal  component  analysis  method,  as  compared  to  the  ordinary  principal  component 
analysis. 


6.  DISCUSSION 

In  this  paper  we  show  that  a  much  smaller  number  of  genes  can  provide  the  same  insight 
for  the  cancer  phenotypes  as  the  whole  set  of  genes.  We  demonstrate  that  block  principal 
component  analysis  is  an  effective  way  to  select  these  genes.  This  kind  of  analysis  is  ^su¬ 
pervised’,  a  term  popular  in  neural  network/pattem  recognition  f  10];  cancer  phenotypes  are 
used  only  to  validate  the  algorithm  and  analysis. 

Selection  of  informative  genes  in  the  microarray  setting,  and  other  settings  as  well,  is  by 
no  means  an  easy  task,  especially  when  the  analysis  is  unsupervised.  Very  likely  the  choices 
of  genes  are  not  unique;  there  might  exist  several  groups  of  genes  that  provide  the  same 
classification.  Biostatisticians  should  provide  every  potential  group  of  genes  to  the  medical 
investigators  and  hopefully  a  meaningful  group  of  genes  can  be  determined  by  combining  the 
statistical  guidance  and  biological  knowledge.  Indeed,  some  preliminary  selection  of  genes 
based  on  biological  knowledge  is  extremely  valuable,  even  before  any  statistical  analysis  is 
conducted.  It  should  be  noted,  however,  that  genes  that  are  biologically  similar/dissimilar  may 
not  be  statistically  similar  (correlated)/dissimilar  (uncorrelated). 

ACKNOWLEDGEMENTS 

The  authors  would  like  to  thank  the  editor  and  three  anonymous  referees  for  their  valuable  comments 
and  suggestions,  which  have  improved  the  manuscript. 

REFERENCES 

1.  Jolliffe  IT.  Principal  Component  Analysis.  Springer- Verlag:  New  York.  1986. 

2.  Anderson  TW.  An  Introduction  to  Multivariate  Statistical  Analysis.  2nd  edn.  Wiley:  New  York.  1984. 

3.  Rencher  AC.  Methods  of  Multivariate  Analysis.  Wiley:  New  York.  1995. 

4.  Hilsenbeck  SG,  Friedrichs  WE.  Schiff  R,  O’Connell  P,  Hansen  RK;  Osborne  CK,  Fuqua  SA.  Statistical  analysis 
of  array  expression  data  as  applied  to  the  problem  of  tamoxifen  resistance.  Journal  of  the  National  Cancer 
Institute  1999;  91:453-459. 

5.  Cheung  VG.  Morley  M,  Aguilar  F,  Massimi  A,  Kucherlapati  R.  Childs  G.  Making  and  reading  microarrays. 
Nature  Genetics  Supplement  1999;  21:15-19. 

6.  Brown  PO,  Botstein  D.  Exploring  the  new  world  of  the  genome  with  DNA  microarrays.  Nature  Genetics 
Supplement  1999;  21:33-37, 

7.  Scherf  W,  Ross  DT,  Waltham  M.  Smith  LH,  Lee  JK.  Tanabe  L,  Kohn  KW,  Reinhold  WC,  Myers  TG. 
Andrews  DT,  Scudiero  DA,  Eisen  MB,  Sausville  EA,  Pommier  Y,  Botstein  D,  Brown  PO,  Weinstein  JN. 
A  gene  expression  database  for  the  molecular  pharmacology  of  cancer.  Nature  Genetics  2000;  24:236-244. 

8.  Khattree  R,  Naik  DN.  Multivariate  Data  Reduction  and  Discrimination  with  SAS  Software.  SAS  Institute 
Inc.:  Cary,  NC,  2000. 

9.  Milligan  G,  Cooper  MC.  An  examination  of  procedures  for  determining  the  number  of  clusters  in  a  data  set. 
Psychometrika  1985;  50:159-179. 

10.  Ripley  RD.  Pattern  Recognition  and  Neural  Networks.  Cambridge  University  Press:  Cambridge  (UK),  1996. 


Copyright  ©  2002  John  Wiley  &  Sons,  Ltd. 


Statist.  Med.  2002;  21:3465-3474 


390 


J.  Med.  Chew.  2002,  45.  390-398 


07  Analogues  of  Progesterone  as  Potent  Inhibitors  of  the  P-Glycoprotein  Efflux 
Pump 

Fabio  Leonessa,1  Ji-Hyun  Kim,1  Alom  Ghiorghis,1  Robert  J.  Kulawiec,*  Charles  Hammer,1 
Abdelhossein  Talebian,*-5  and  Robert.  Clarke*-1 

Departments  of  Oncology,  Physiology  and  Biophysics,  and  Lombardi  Cancer  Center,  Georgetown  University  School  of  Medicine. 
3970  Reservoir  Road  Northwest,  Washington.  DC  20007,  and  Department  of  Chemistry,  Georgetown  University. 

37th  and  O  Streets  Northwest,  Washington.  DC 

Received  March  20,  2001 


The  P-glycoprotein  product  (Pgp)  of  the  MDR1  gene  lias  been  implicated  in  the  multiple  drug 
resistance  phenotype  expressed  by  many  cancers.  Functioning  as  an  efflux  pump.  P-glycoprotein 
prevents  the  accumulation  of  high  int  racellular  concentrations  of  substrates.  We  have  taken  a 
rational  approach  to  designing  inhibitors  of  P-glycoprotein  function,  selecting  a  natural  substrate 
(progesterone)  as  our  lead  compound.  We  hypothesized  that  progesterone,  substituted  at  C  7 
with  an  aromatic  moiety(s),  would  exhibit  reduced  Pgp  affinity,  significantly  increased  antiPgp 
activity,  and  reduced  affinity  for  progesterone  receptors  (PGR).  We  synthesized  7a-[4'- 
(aminophenyl)thio]pregna-Tene  3,20  dione  (2),  which  comprises  a  C-7a  thiol  bridge  linking 
an  aminophenyl  moiety  to  progesterone,  from  pregna  4 ,6-diene  3, 20-dione  (1).  The  subsequent 
addit  ion  reaction  of  2  with  the  appropriate  isocyanate  produced  an  initial  series  of  compounds 
(3-6).  Compounds  3-5  (respectively,  — CH2CH2CI;  -CH2CH3;  and  -CHfCHaJCeHs)  exhibit  a 
significantly  increased  ability  to  inhibit  P-glycoprotein.  Potency  for  .restoring  doxorubicin 
accumulation  in  MDR1  -transduced  human  breast  cancer  cells  is  increased  up  to  60  fold  as 
compared  with  progesterone.  Compound  5  has  greater  potency  than  verapamil  and  is  equipotent 
with  cyclosporin  A,  for  inhibiting  P  glycoprotein  function.  Furthermore,  5  does  not  bind  to  PGR, 
implying  a  potential  reduct  ion  in  in  vivo  toxicity.  These  data  identify  C-7  substituted 
progesterone  analogues  and  5,  in  particular,  as  rationally  designed  antiPgp  compounds  wor  thy 
of  furt  her  evaluation/development. 


Introduction 

While  many  cancers  are  initially  responsive  to  cyto¬ 
toxic  chemotherapy,  most  acquire  a  resistant  phenotype. 
This  phenotype  is  often  characterized  by  crossresistance 
to  structurally  unrelated  drugs  to  which  the  tumor  has 
not  been  exposed.  The  precise  genes  that,  confer  this 
multidrug  resistance  phenotype  are  unknown,  but  there 
are  several  strong  single-gene  candidates.  These  include 
several  ABC  transporters  including  the  P-glycoprotein 
product  of  the  MDR1  gene  (Pgp;  gp!70),]  the  lung 
resistance  protein,2  the  breast  cancer  resistance  pro 
tein,3  and  several  members  of  the  multidrug  resistance 
associated  protein  family.4'5  The  precise  contribution  of 
each  potential  multidrug  resistance  mechanism  is  un¬ 
clear.  Indeed,  more  than  one  mechanism  may  operate, 
either  within  the  same  tumor  cell  subpopulation  and/ 
or  within  different  subpopulations  of  t  he  same  tumor. 

We  have  chosen  to  study  Pgp- mediated  multiple  drug 
resistance.  Pgp  confers  resistance  to  drugs  by  prevent 
ing  their  accumulation  within  the  cell.  Pgp’s  efflux 
capabilities  appear  to  reflect  its  ability  to  bind  sub 
st rates  within  the  inner  leaflet  of  the  plasma  mem 
brane.6  Subsequently,  and  in  a  potentially  adenosine  5' 
triphosphate  (ATP) -dependent  manner,  substrates  are 
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expelled  from  the  cell.7  We  have  shown  by  meta  analysis 
that  Pgp  expression  is  detected  in  >50%  of  breast 
cancers  and  that  this  expression  is  associated  with  prior 
chemotherapy,  a  worse  than  partial  response  to  chemo¬ 
therapy,  and  in  vitro  resistance  to  Pgp  substrates.8 
Those  data  suggest  that  where  Pgp  expression  is 
delected,  it:  likely  contributes  to  multiple  drug  resistance 
in  some  breast  cancers.  Nonetheless,  the  likely  role  of 
Pgp  in  conferring  drug  resistance  remains  controversial. 

The  poor  activity  of  current.  antiPgp  agents  in  patients 
has  been  attributed  to  the  presence  of  resistance  factors 
in  addition  to  Pgp,  inappropriate  design  of  clinical  trials, 
toxicity,  and/or  lack  of  specificity  of  antiPgp  reagents. 
The  ability  of  reversing  agents  to  alter  the  pharmaco¬ 
kinetics  of  the  coadministered  cytotoxic  drugs,  and  an 
inability  to  achieve  adequate  levels  of  some  reversing 
agents,  also  are  problematic.9  The  absence  of  a  series 
of  nontoxic  drugs,  specifically  designed  to  reverse  Pgp, 
limits  the  design  of  clinical  trials  to  reverse  this  form 
of  multidrug  resistance. 

One  aspect  of  the  controversy  regarding  the  role  of 
Pgp  comes  from  the  relatively  poor  activity  of  those  few 
Pgp  reversing  agents  evaluated  in  clinical  trials.  Most 
attention  has  focused  on  the  Pgp  reversing  agents 
verapamil,  cyclosporin  A,  and  its  nonimmunosuppres- 
sant  analogue  PSC833  (valspodar),  but  the  activity  of 
other  drugs  also  has  been  studied  in  patients.  Few  of 
these  compounds  were  designed  as  Pgp  inhibitors.  Thus, 
severe  side  effects,  often  related  to  either  the  “normal” 
function  of  these  agents  and/or  their  ability  to  influence 
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additional  targets,  may  be  induced  at  concentrations 
required  to  affect  Pgp  function.  Many  antiPgp  drugs 
affect  the  pharmacokinetics  of  the  substrate,  signifi¬ 
cantly  increasing  cytotoxic  drug-induced  toxicity.10  Other 
antiPgp  agents  cannot  readily  be  delivered  at  doses  that 
produce  adequate  serum  levels.  For  example,  the  serum 
levels  of  verapamil  required  to  produce  in  vitro  reversal 
of  Pgp  resistance  are  rarely  achieved  in  patients,  despite 
administering  sufficient  doses  of  verapamil  to  induce 
significant  cardiotoxicity.9,1  u2  Adequate  serum  trifluo¬ 
perazine  levels  are  not  reached  in  patients  at  doses  that 
induce  dose-limiting  t:oxi  cities. 9,13  Peak  plasma  levels  of 
the  stereoisomer  of  c/s-flu penthixol  (tram-Jlupenthixol) 
are  1000-fold  less  than  that  necessary  to  achieve  full 
chemosensitization  in  vitro.14-15  Several  clinical  studies 
have  used  patient:  populations  where  tumor  Pgp  expres 
sion  is  unknown,  complicating  a  clear  determination  of 
its  contribution  to  multiple  drug  resistance. 

Previously,  we  have  established  cellular  breast  cancer 
models  in  which  to  study  Pgp-mediated  efflux  and 
evaluate  inhibitors  of  this  activity.  These  models  have 
been  generated  by  inducing  a  constitutive  expression 
of  Pgp,  following  transduction  with  retroviral  gene 
expression  vectors.16-17  The  major  advantage  of  these 
models  is  that  unlike  cells  selected  for  resistance  in 
vitro,  Pgp  expression  is  the  only  mechanism  present  to 
produce  the  multiple  drug  resistance  phenotype.  For 
example,  the  widely  used  MCF7ADR  ceils,  which  were 
selected  in  vitro  for  resistance  to  doxorubicin  (DOX)  and 
recently  redesignated  NCI/ADR-RES,18  exhibit  in¬ 
creased  glutathione  transferase  and  topoisomerase  II 
activities.19-20  Differences  in  the  potency  of  isomers  of 
fluphenthixol  identified  in  MCF7ADR  ceils  could  not  be 
confirmed  in  MDR1 -transfected  NIH  3T3  cells.15 

Using  our  cellular  models,  we  now  describe  an  initial 
series  of  progesterone  analogues  that  exhibit:  signifi 
cantly  increased  antiPgp  activity  as  compared  with 
progesterone  and  verapamil  and  comparable  to  that 
seen  with  cyclosporin  A.  Importantly,  the  most  potent 
of  these  analogues  has  lost  its  ability  t:o  activate 
progesterone  receptors  (PGR)  and  is  predicted  to  exhibit 
relatively  low  intrinsic  toxicity  in  vivo. 

Chemistry 

Conceptualization  and  Design.  We  wished  to  take 
a  rational,  structure-function-based  approach  to  design 
inhibitors  of  Pgp  function.  Initially,  we  hypothesized 
that  a  natural  substrate  for  the  pump  could  provide  an 
ideal  candidate  for  rational  drug  design,  since  it  is  likely 
that:  Pgp  evolved  specifically  to  .efflux  such  molecules. 
Evidence  shows  that  several  molecules  with  a  steroid 
nucleus  are  Pgp  substrates.21"23  Pgp  is  expressed  in  the 
uterus23-24  and  the  placenta,25  suggesting  a  natural  role 
for  protecting  secretory  cells  from  the  toxic  effects  of 
high  local  concent  rations  of  steroids.  Progesterone  is  the 
most  potent:  of  the  steroids,  including  progesterone's 
metabolites,  for  reversing  the  effects  of  Pgp  expres¬ 
sion.23-26-27  Progestins  have  intrinsically  lower  toxicity 
than  other  reversing  agents  and  are  orally  active.  In 
addition,  progesterone  is  readily  available  and  cheap, 
and  the  chemistry  for  generating  several  structural 
modifications  is  relatively  straightforward.28-29  Thus,  we 
chose  progesterone  as  our  lead  compound. 
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The  major  beneficial  properties  we  wished  to  confer 
included  but  were  not  restricted  to  (i)  improved  potency 
for  Pgp  reversal,  (ii)  either  no  change  or  a  reduction  in 
affinity  for  PGR.  and  (iii)  no  agonist  (mitogenic)  activi¬ 
ties.  Concurrently,  we  wished  to  avoid  either  a  substan- 
tial  increase  in  PGR  binding  or  a  loss  of  Pgp  reversing 
potency. 

Unfortunately,  the  precise  structure-function  char¬ 
acteristics  of  Pgp  reversing  agents  are  unknown.  This 
is  not.  surprising,  given  the  remarkable  structural 
diversity  of  Pgp  substrates.30  Nonetheless,  several 
characteristics  are  apparent,  providing  generic  guide¬ 
lines  for  the  design  of  Pgp  reversing  agents.  Lipophi- 
licity  appears  central,  with  increased  lipophilicity  strongly 
associated  with  increased  antiPgp  activity.31"37  Planar 
aromatic  rings  are  commonly  found  in  substrates,  and 
these  may  contribute  to  lipophilicity.37  Amphipathicity 
also  is  common,  as  is  the  presence  of  a  basic  amine, 
where  primary  amines  appear  most  effective.31-33,35,36,38 
Size,  for  example,  as  determined  by  calculated  molar 
refract ivily,  appears  an  important  factor  in  several 
classes  of  compounds.32-35-39  C21  -aminosteroids  have  a 
structural  similarity  to  progesterone,  and  in  these 
compounds,  the  steroid  moiety,  lipophilicity,  and  amphi¬ 
pathicity  are  considered  important  attributes.40  For 
compounds  composed  of  two  structures  joined  by  a 
molecular  spacer,  the  length  of  the  spacer  seems  im¬ 
port  a  nt.15-31-34-41-42  This  suggests  that  some  part  of  the 
molecule  may  be  oriented  into  a  “pocket;”  in  Pgp.31  This 
pocket  may  have  specific  requirements  for  lipophilicity, 
size,  and  charge. 

A  C-7  addit  ion  to  the  steroid  1  7/Testradiol,  as  occurs 
in  the  anti  estrogens  Id  182,780  and  J  Cl  164,384, 
produces  compounds  with  low  toxicity  and  potentially 
appropriate  pharmacokinetics.43-44  Limited  evidence 
suggests  that  IC]  164,384  can  reverse  Pgp  mediated 
resistance,45  despite  the  apparent  inability  of  17/T 
estradiol  to  do  so.23  Thus,  a  bulky  C-7  substitution  on  a 
steroid  nucleus  might,  increase  antiPgp  activity.  C-7- 
substituted  progesterone  analogues  were  synthesized  20 
years  ago,  but:  several  exhibit  antiprogest.ational  activ¬ 
ity.46  Data  from  these  studies  suggest  that  bulky  addi¬ 
tions  at  C-7,  when  these  include  an  aromatic  ring, 
reduce  PGR  affinity  by  approximately  10—1000-fold.46 
Consequently,  it  may  be  possible  to  reduce  the  endog 
enous  toxicity  of  progesterone  by  reducing/eliminating 
its  ability  to  bind  PGR. 

On  the  basis  of  the  various  structure— function  ob¬ 
servations  noted  above,  we  hypothesized  that  proges¬ 
terone,  substituted  at  C-7  with  an  aromatic  moiety(s), 
would  exhibit  both  reduced  Pgp  affinity  and  signifi¬ 
cantly  increased  antiPgp  activity.  Thus,  we  designed  an 
initial  compound  from  which  we  could  derive  an  ap 
propriate  series  of  progesterone  analogues  for  evalua¬ 
tion.  This  compound,  7a-[4'  (aminophenyl)thio]pregna 
4  ene  3,20-dione  (2),  has  a  C-7  thiol  bridge  linking  an 
aminophenyl  moiety  to  progesterone.  Subsequent  ad¬ 
ditions  to  the  amine  with  the  appropriate  isocyanate 
generated  the  corresponding  Pgp  analogues.  For  our 
initial  series  of  compounds,  we  selected  isocyanates  that 
would  provide  analogues  with  predicted  differences  in 
the  size,  lipophilicity.  and  charge  of  their  C-7  additions. 

Synthesis.  Compound  1  (Scheme  1)  was  prepared 
from  progesterone  by  a  modified  Turner  and  Ringold’s 
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Table  1.  Structure  and  Physical  Properties  of  C-7 
Progesterone  Analogues8 _ 


compel 

R 

mp  (deg) 

Rfb 

2 

N/A 

228-230 

0.23 

3 

-ch2ch2ci 

137-141 

0.47 

4 

-ch2ch3 

130-135 

0.36 

5 

-CH(CH3)C6H5 

146-149 

0.46 

6 

-so2c6h4ch3 

128-132 

0.29 

a  See  Scheme  1  for  the  structures  of  2-6.  b  Rf  in  hexane-ethyl 
acetate  (2:3). 


method,47"49  using  2,3-dich]oro-5,6-dicyano-  1.4-benzo 
quinone  (DDQ)  as  an  oxidizing  agent  and  //toluene 
sulfonic  acid  (p-TsOH)  in  refluxing  benzene  via  Dean- 
Stark  distillation.  Purification  of  the  crude  1  on  silica 
gel  gives  6-dehydroprogesterone  (1)  as  a  yellow  solid 
(35%,  Rr=  0.44,  2:3  hexanes-et.hyl  acetate,  mp  =  143- 
145  WC).  Reaction  of  compound  1  with  4-aminot.hio 
phenol  and  NaOH  pellets,  in  degassed  dioxane  as 
solvent  for  6  days  at  74  °C,  provided  7a-|(4'-amino 
pile ny  1)1  hi o] pr eg n a -  4 -ene- 3 , 20-d i on e  (2)  as  an  ivory 
solid.  Crude  crystals  of  2  were  precipitated  from  a 
mixture  of  hexanes-ethyl  acetate  and  purified  by  flash 
column  chromatography  to  yield  790  mg  of  white  solid 
(61%,  Rr"  0.23,  2:3  hexanes-ethyl  acetate,  mp  =  228- 
230  °C). 

The  additional  C-7  progesterone  ana  logues  (3-6)  were 
obtained  by  reacting  compound  2  with  the  appropriate 
isocyanate  (Table  1).  The  general  reaction  was  per¬ 
formed  under  a  N2  atmosphere  for  12  h.  until  no  2  was 
detected  by  thin-layer  chromatography  (TLC),  and  the 
solvent  was  removed  under  reduced  pressure.  All  crude 


analogues  were  purified  by  flash  column  chromatogra¬ 
phy  to  yield  the  corresponding  ureas  as  a  white  solid 
(40-83%).  The  physical  properties  of  these  analogues 
are  provided  in  Table  1  as  mp  and  Rf  on  silica  gels  (2:3 
hexanes-ethyl  acetate). 

Results  and  Discussion 

Substrate  Accumulation  Studies.  Because  Pgp  is 
an  efflux  pump,  we  measured  the  ability  of  our  com¬ 
pounds  to  influence  the  intracellular  accumulation  of 
the  cytotoxic  drugs  vinblastine  (VBL)  and  DOX.  Both 
drugs  are  widely  used  clinically  and  are  efficiently 
effluxed  by  Pgp.50  Activity  was  evaluated  in  MDR1- 
transduced  human  breast  cancer  cells  (MDA435/ 
LCC6MDRI),  using  the  parental  cells  (MDA435/LCC6)  as 
the  Pgp-negative  control.  Potency  of  the  compounds  was 
compared  wit  h  that:  of  progesterone  and  t  he  established 
Pgp  inhibitors  verapamil  and  cyclosporin  A.  VBL  con¬ 
tent:  in  Pgp  positive  cells,  exposed  to  media  containing 
5  nM  [31 1]  V11L,  was  approximately  6  fold  lower  than 
in  parental  Pgp  negative  cells.  Cellular  content:  of  DOX, 
in  cells  exposed  to  4  pM  DOX,  was  about  8  fold  lower 
in  the  presence  than  in  the  absence  of  Pgp. 

Results  of  VBL  and  DOX  accumulation  studies,  sum¬ 
marized  in  terms  of  EC  so*  are  presented  in  Table  2. 
Because  these  data  were  estimated  from  dose  response 
curves,  representative  curves  are  shown  in  Figure  1. 
Treatment  with  progesterone  analogues  3—6  reverses 
the  difference  in  VBL  and  DOX  content  between  Pgp 
positive  and  Pgp-negative  cells.  Analogues  3-5  exhibit 
significantly  increased  antiPgp  potency  as  compared 
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Table  2.  Potency  of  C-7  Progesterone  Analogues,  Progesterone,  Verapamil,  and  Cyclosporin  A  in  Reversing  the  Difference  in  VBL 
and  DOX  Accumulation  between  Pgp- Negative  and  Pgp- Positive  Cells  _ 


compd 

reversal  of  pl  lj  VBL  accumulation 

reversal  of  DOX  accumulation 

EC50  ,wMft 
(relative  potency)7' 

Pgp-specific  EC snc//M 
(relative  potency) 

EC50AM 

(relative  potency) 

Pgp-specific  EC50  A M 
(relative  potency) 

progesterone 

18.7d3.7d  (1) 

21 .0  d  4.2  (1) 

22.3  d  2.0  (1) 

42.2  d  7.2  (1) 

3 

0.8  ±  0.2  (22.5) 

0.9  ±0.2  (23.5) 

0.6  ±  0.1  (40.5) 

0.7  d  0.2  (60.2) 

4 

1.3  d-  0.1  (14.2) 

1.5  ±0.2  (14.0) 

0.7  dr  0.07  (31.3) 

1.0  d  0.06  (42.7) 

5 

0.8  d  0.2  (24.5) 

0.7  d  0.2  (28.2) 

0.6  d  0.07  (37.2) 

0.9  d  0.09  (44.8) 

6 

34 .8  d  8.6  (0.5) 

33.4  d  5.2  (0.6) 

1 4.7  d  3.2  (1.5) 

37.0  1.6.5  (1.1) 

verapamil 

1.2  d  0.2  (16.1) 

3.1  d  0.9  (6.8) 

2.4  d  0.3  (6.2) 

4.1  d  0.5  (10.2) 

cyclosporin  A 

0.0  d  0.06  (32.5) 

0.6  d  0.06  (32.5) 

0.5  d  0.1  (41.9) 

0.7  d  0.2  (60.6) 

*EQ,o  “  drug  concentration  that  reduces  the  difference  in  drug  accumulation  between  MDA435/LCC6  and  MDA435/LCC6MDR1  cells 
by  50%;  obtained  by  interpolation  on  dose  response  curves.  Representative  curves  are  shown  in  Figures  1  ([3H]  VBL)  and  2  (DOX).  b  Values 
in  parent  heses  represent  the  potency  of  each  compound  relative  to  the  lead  compound  progesterone.  Pgp-specific  ECs«  •"  data  corrected 
for  any  effect  of  test  compound  on  drug  accumulation  in  MDA435/LCC6  (Pgp  negative)  cells.  d  Values  represent,  the  mean  dr  SE  obtained 
from  at  least  three  independent  experiments. 


[Reversing  agent]  nM 


Figure  1.  Ability  of  C-7  progesterone  analogues  to  affect  [3H] 
VBL  accumulation  (A)  and  DOX  accumulation  (B)  in  MDA435/ 
LCC6  and  MDA435/LCC6MDR1  cells.  Data  (mean  ±  SE)  are 
from  one  of  three  or  more  representative  experiments  used  to 
obtain  the  ED50  values  presented  in  Table  2.  Progesterone  = 
•,  cyclosporin  A  =  O,  verapamil  =  ▲,  3  =  ■,  4  =  A,  5  =  □,  and 
6  =  V. 

with  progesterone,  being  14— GO- fold  more  potent.  In 
marked  contrast,  6  is  only  equipotent:  with  progesterone. 
Three  compounds  (3-5)  are  significantly  more  potent 
than  verapamil,  when  Pgp-specific.  ECr,os  are  compared 
for  both  VBL  and  DOX  accumulation.  Compounds  3  and 
5  were  equipotent:  with  cyclosporin  A  (p  >  0.05  for  all 
comparisons).  Recently,  we  have  established  a  chro¬ 
matographic  approach  for  assessing  relative  Pgp  binding 
affinities.51-52  Studies  to  measure  the  affinity  of  these 
analogues  are  in  progress. 

While  3  and  5  tend  to  be  slightly  more  potent  than  4, 
the  difference  is  not  statistically  significant.  This  sug 
gests  that  addition  of  either  a  Cl  (3)  ora  second  aromatic 
ring  (5)  does  not  further  increase  activity.  Jn  marked 
contrast,  the  presence  of  the  sulfonyl  group  in  6  elimi 


nates  the  gain  in  activity  conferred  by  the  C-7  moiety. 
Thus,  the  increased  activity  in  5  is  not:  simply  due  to 
the  presence  of  an  aromatic  F  ring  (Scheme  1).  Further 
structural  modifications  will  allow  us  to  test  further  the 
structure-activity  relationships  of  C-7  progesterone 
analogues  for  Pgp  reversal. 

A  major  problem  with  many  existing  antiPgp  com 
pounds  is  their  intrinsic  toxicity.  We  wished  to  obtain 
an  in  vitro  assessment  of  the  toxicity  of  our  compounds 
relative  to  progesterone,  verapamil,  and  cyclosporin  A. 
We  used  our  breast  cancer  cell  models  because  they  do 
not  express  PGR  and  would  provide  a  simple  model  for 
assessing  PGR  independent:  cytotoxicity.  Furthermore, 
any  reduction  in  toxicity  seen  in  the  M DA 4 3 5/LC C6MDR 1 
cells,  as  compared  with  the  MDA435/LCC6  cells  (rela¬ 
tive  resistance  of  Pgp-positive  cells  in  Table  3),  would 
suggest  that:  the  compounds  were  Pgp  substrates,  not 
simply  Pgp  inhibitors.  Results  are  summarized  in  terms 
of  'IC50  in  'Fable  3;  representative  dose  response  curves 
are  shown  in  Figure  2. 

To  estimate  relative  activity,  each  drug’s  intrinsic 
cytotoxicity  was  expressed  relative  to  its  antiPgp  activ¬ 
ity  (IC50/EC50;  Table  3).  We  did  not  detect  cytotoxicity 
for  compounds  4  and  5,  due  to  their  low  solubility, 
rendering  our  ratios  underestimates  based  ori  the  high¬ 
est  (noncytotoxic)  concentration  tested.  Nonetheless,  4 
produces  —40%  inhibition  of  proliferation  at  20  juM.  In 
marked  contrast,  20  //M  5  does  not  inhibit  proliferation 
significantly  in  either  untreated  cells  or  MDA4.35/LCC6 
and  M D A 4 3 5 /LC C 6MDR 1  cells. 

When  adjusted  for  cytotoxicity,  cyclosporin  A  and 
progesterone  exhibit  approximately  equivalent  relative 
activities.  The  low  estimates  for  cyclosporin  A  reflect 
its  substantial  toxicity.  Compound  6  is  the  least  active 
compound,  and  5  is  the  most  active  despite  the  over¬ 
estimation  of  its  cellular  toxicity.  While  VBL  and  DOX 
may  have  different  recognition  sites  in  Pgp,53  5  shows 
broadly  comparable  activity  against  both  drugs,  as  does 
cyclosporin  A. 

Having  established  that,  the  C-7  addition  significantly  • 
increased  antiPgp  activity,  we  wished  to  evaluate  the 
PGR  activity  of  our  best  compound.  Overall,  3  and  5 
have  antiPgp  activity  comparable  to  cyclosporin  A. 
Because  3  exhibits  significant  cellular  toxicity,  we  chose 
to  evaluate  the  relative  affinity  of  5  for  binding  to  PGR. 
We  compared  the  ability  of  5,  progesterone,  and  un 
labeled  ORG2058,  a  synthetic  progestin,  to  compete  with 
|3l  i|  ORG2Q58  for  binding  to  PGR.  The  data  in  Figure 
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Table  3.  Growth  inhibitory  Activity  of  C-7  Progesterone*  Analogue,  Progesterone,  Verapamil,  and  Cyclosporin  A  on  MDA435/LCC6 
(Pgp  Posit  ive)  and  MDA435/LCC6M^R1  (Pgp  Negative)  Human  Breast.  Cancer  Cells 


compel 

ICm 

MDA435/LCC6 
(re  la  f  ive  cy to tox  i c  i  ty) 

>//M 

MDA4  35/LCCfiMDm 
(rolal  ive  cytotoxicity) 

relative  resistance 
of  Pgp  positive  cells*' 

ICso/ 

VBL  activity 

ECso,? 

DOX  activity 

progesterone 

27.4  ±  7.9d(1.0) 

30.4  :l:  8.3  (1.0) 

1 .4  Jr  0.09 

1.3 

0.(5 

3 

3.2  dr  0.09  (8.5) 

7.3:1:  2.5  (5.0) 

2.2  dr  0.7 

3.6 

3.2 

4 

>20.0  (ND)f 

>20.0  (ND) 

ND 

>13.3 

>20.0 

5 

>20.0  (ND) 

>20.0  (ND) 

ND 

>28.0 

>22.2 

6 

22.1  ±  1.6  (1.2) 

38.2  ±  0.2  (1.0) 

1.7:1.  0.1 

0.7 

0.6 

verapamil 

65.8  ±  0.04  (0.4) 

63.4  dr  1.2  (0.6) 

1 .0  dr  0.02 

21.2 

16.0 

cyclosporin  A 

1 .0  ±  0.5  (26.9) 

1.1  d:  0.2  (33.6) 

1 .3  d:  0.4 

1.7 

1.4 

a  Relative  cytot  oxicity  =  ability  of  compounds  to  inhibit  cell  growth  relat  ive  to  progesterone.  b  Relative  resistance  of  Pgp-positive 
ratio  of  toxicity  in  resistant  and  control  cells.  Relative  resistance  >  1  suggests  that  the  compound  is  at  least  partly  effluxed  by  Pgp.  cThis 
ratio  relates  the  effect  of  each  drug  on  either  VBL  or  DOX  accumulation  to  its  intrinsic  cellular  toxicity;  higher  ratios  suggest  a  greater 
degree  of  safety.  rl  Values  represent  the  mean  SE  obtained  from  at  least  t  hree  independent  experiments.  *  ND  -  no  data;  an  IC50  was 
not  reached  at  the  highest  concentration  tested  (limited  by  solubility).  Some  values  (>)  are  underestimates  based  on  nontoxic  concentrations. 


Figure  2.  Cytotoxicity  of  progesterone  analogues  in  MDA435/ 
LCC6  (A)  and  MDA435/LCC6Mdri  cells  (B).  Data  (mean  ±  SE) 
are  from  one  of  three  or  more  representative  experiments  used 
to  obtain  the  IC50  values  presented  in  Table  3.  Progesterone 
=  •,  cyclosporin  A  =  O,  verapamil  =  ▲,  3  =  ■,  4  =  A,  5  =  □, 
and  6  =  v. 

3  show  that  5,  at.  concentrations  up  to  its  EC  so,  does 
not  significantly  compete  with  ORG2058.  This  repre¬ 
sents  a  reduction  of  >  100-fold  in  its  PGR  affinity  as 
compared  with  ORG2058.  Thus,  5  has  antiPgp  activity 
comparable  to  cyclosporin  A,  exhibits  potentially  low 
intrinsic  cellular  toxicity,  and  does  not  bind  to  its 
predicted  cellular  target  (PGR)  at  its  E(>>o  for  inhibition 
of  Pgp  activity. 

Conclusions 

While  we  cannot,  draw  definitive  structure-activity 
conclusions,  some  potentially  useful  preliminary  obser¬ 


ving]  nM 

Figure  3.  Competitive  binding  of  progesterone,  ORG2058, 
and  5  to  PGR.  PH]  ORG2058  was  used  as  the  radiolabeled 
ligand.  Progesterone  -  •,  ORG2058  =  O,  and  5  =  □. 

vations  can  guide  future  studies.  The  molecules  are 
clearly  amphipathic,  with  lipopbilicity  greatest  around 
the  “E"  and/or  “P"  rings  and  the  polarity  greatest  around 
C-17— C-21.  Tfiese  observations  suggest  that  effective 
substrates  may  concurrently  interact  with  both  hydro¬ 
philic  and  hydrophobic  regions.  However,  it  is  not  clear 
whether  these  are  both  in  Pgp  as  previously  suggested31 
or  whether  they  represent  pockets  at  the  plasma 
mernbrane/Pgp  interface.  The  possibility  that  drugs  are 
removed  from  within  the  plasma  membrane0  may  favor 
the  model  that  invokes  a  plasma  membrane  component 
to  the  binding  interaction. 

Compounds  3-5  are  significantly  more  potent  than 
progesterone  at  specifically  increasing  Pgp  substrate 
accumulation.  These  observations  are  consistent;  with 
our  initial  hypothesis  that:  aromatic  C-7  substitutions 
of  progesterone  will  increase  activity  and  with  the 
known  contribution  of  aromatic  moieties  in  other  modu¬ 
lating  agents.37*12  Compound  6  also  contains  a  C-7 
aromatic  addition  but  is  essentially  equi potent,  with 
progesterone.  Perhaps  the  simplest  explanation  is  that 
this  compound  is  the  least  lipophilic  of  the  analogues, 
since  lipophilicity  appears  to  be  a  major  factor  in  the 
activity  of  other  Pgp  substrates.31  35 

The  significant  increase  in  potency  observed  with 
compounds  3-5  supports  our  initial  structure— function 
based  hypothesis,  based  on  previous  published  observa¬ 
tions.  The  activity  of  our  existing  compounds  already 
compares  well  with  that  of  cyclosporin  A.  C-7  progest.- 


0  7  Analogues  of  Progesterone  as  Potent  Inhibitors 

erone  analogues  have  the  potential  lo  provide  more 
potent,  selective,  and  safe  inhibitors  of  Pgp  function 
than  others  that  have  currently  completed  clinical  trials. 
We  believe  that  the  observations  reported  here,  com 
biried  with  the  lack  of  receptor  binding  activity,  identify 
5  as  the  next  logical  lead  compound  for  further  develop 
ment  and  provide  valuable  clues  for  the  further  opt.i 
mization  of  this  structure.  We  are  currently  synthesiz¬ 
ing  a  larger  series  of  compounds  to  furt  her  optimize  the 
MDR1  reversing  potency  and  effectively  define  the 
structure-activity  relationships  of  these  compounds. 

Our  ability  to  increase  the  potency  of  progesterone 
up  to  60  fold  (3;  Pgp-specific  EC50  for  DOX  accumula¬ 
tion)  supports  the  use  of  relatively  limited  st  ructure- 
function  data  in  the  design  of  effective  antiPgp  com 
pounds.  Furthermore,  by  including  structure-activity 
information  on  the  binding  characteristics  of  the  leads 
natural  intracellular  target  (PGR)  in  our  conceptual 
ization.  we  reduced  affinity  of  5  for  a  target  that  could 
produce  toxicity  in  normal  cells.  We  are  now  poised  to 
evaluate  our  compounds  in  vivo,  to  pursue  further 
modifications  that  may  increase  antiPgp  activity,  and 
to  explore  the  structure-activity  relationship  for  C-7 
progesterone  analogues  in  detail.  Overall,  the  data  in 
this  study  identify  C-7 -substituted  progesterone  ana¬ 
logues  arid  5,  in  particular,  as  rationally  designed 
antiPgp  compounds  worthy  of  further  evaluation/ 
development. 

Experimental  Section 

Chemistry.  General  Procedures.  All  reactions  were 
carried  out  under  an  atmosphere  of  nitrogen  using  st  andard 
Schlenk  techniques.5'1  Benzene  and  chloroform  wore  distilled 
from  Cal-fe,  stored  over  3D  molecular  sieves,  and  deaerated 
by  purging  with  nitrogen  immediately  before  use.  TLC  was 
performed  using  Merck  glass  plates  precoated  with  F254  silica 
gel  60;  compounds  were  visualized  by  IJV  and/or  with  p 
anisaldehyde  stain  solution.  Flash  chromatography  was  per 
formed  using  EM  Science  silica  gel  60,  following  the  procedure 
of  Still,55  with  the  solvent  mixtures  indicated.  Melt  ing  points 
were  measured  on  a  Thomas-Hoover  capillary  melting  point 
apparatus  and  are  uncorrected  (Table  1).  The  broad  melting 
points  for  compounds  3-6  suggest  the  presence  of  minor 
impurities.  All  reagents  were  purchased  from  commercial 
suppliers  and  used  as  received  unless  indicated  otherwise. 
Dioxane  was  purchased  from  Aldrich  in  Sure  Seal  bottles. 

Nuclear  magnetic  resonance  (NMR)  spectra  were  measured 
on  Nicolet  NT  270  and  Varian  Mercury  300  MHz  instruments 
at  the  Georgetown  NMR  Facility.  Chemical  shifts  are  reported 
in  units  of  parts  per  million  relative  to  Me.iSi.  All  spectra  are 
recorded  in  CDCJa.  Significant  *H  NMR  data  are  tabulated  in 
the  following  order:  multiplicity  (s,  singlet:;  d,  doublet;  tr 
triplet:;  q,  quartet;  m,  multiplet),  coupling  constants  in  Hertz, 
and  number  of  protons.  13C  NMR  spectra  were  recorded  at 
frequencies  of  67.9  and  75.6  MHz.  Infrared  (JR)  spectra  were 
measured  on  a  MIDAC  Corp.  or  a  Mattson  Galaxy  2020  Series 
FTIR,  as  neat  films;  absorption  bands  are  reported  in  cnr1. 
Low- resolution  mass  spectra  were  measured  on  a  Fisons 
Instruments  MD  800  quadrupole  mass  spectrometer,  with  70 
EV  electron  ionization  and  a  GC  8000  Series  gas  chromato¬ 
graph  inlet;,  and  using  a  J  &  W  Scientific  DB5MS  column  (1 5 
m  length,  0.25  rnm  internal  diameter,  0.25 /un  film  thickness). 
Mass  spectra  data  are  given  as  mass-to-charge  ratio,  with  the 
relative  peak  height  following  in  parentheses.  All  new  corn 
pounds  were  characterized  by  *H  NMR,  IR,  and13C  NMR 
spectroscopies.  Fast,  atom  bombardment  mass  spect  ra  (FABMS) 
were  recorded  at  the  University  of  Maryland  College  Park  of 
Mass  Spectrometry  Facility.  Literature  references  are  given 
for  all  known  compounds,  except  for  those  that  are  com¬ 
mercially  available;  all  known  compounds  were  identified  by 
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]H  NMR  spectroscopy.  Elemental  analysis  was  performed  by 
Atlantic  Microlab  (Norcross,  GA). 

Pregna-4,G-diene-3,20-dione  (1).  Compound  1  was  pre¬ 
pared  by  the  met  hod  of  Tuner  and  Ringold.47-49  Thus,  p  TsOH 
mnnohydratc  (11.0  g,  63.9  mmol)  was  dehydrated  in  freshly 
distilled  benzene  (320  mL)  by  azeotropic  refluxing  using  a 
Dean -Stark  t  rap.  After  1  h,  the  solut  ion  was  cooled  for  0.5  h, 
and  progesterone  (5.0  g,  15.9  mmol)  and  DDQ  (4.6  g,  20.3 
mmol)  were  added.  The  olive  mixture  was  refluxed  for  3  h  and 
then  filtered  through  a  pad  of  Celite.  The  filtrate  was  washed 
with  saturated  NaCl  (5  x  20  mL)  followed  by  1%  NaOH 
solution  until  it  gave  clear  solution  and  dried  over  anhydrous 
MgSCL.  Solvent  was  removed  under  reduced  pressure,  and  the 
filtrate  was  purified  by  chromatography;  1.69  g  of  product. 
(35%,  Rf~  0.44,  2:3  hexanes— ethyl  acetate);  yellow  solid  (rnp 
-  1 43-145  °C).  NMR:  &  6.12  (s,  111),  5.69  (s.  JH),  2.84- 
1.12  (complex,  12H),  2.17  (s,  3H),  2.14  (s,  3H),  1.12  (s,  3H), 
1.10  (s,  1H),  LOO  (s,  JH),  0.72  (s,  3H).  IR:  3855,  3745,  3678, 
2953,  1700,  1663,  1457,  1361.  3223,  875,  754. 

7a-[4'-(Aminophenyl)thio)pregna-4-e«e-3,20-dione  (2). 
We  obtained  2  using  the  method  of  Brueggerneier  et  al.5G 
Briefly,  1  (1.65  g,  5.28  mmol),  4-aminothiophenol  (1.32  g,  10.56 
mmol),  and  NaOH  (pellet,  1 16  mg,  2.9  mmol)  were  placed  in 
a  Schlenk  tube,  which  was  purged  with  a  constant  flow  of 
Nz(g).  Deoxygen aled  anhydrous  dioxane  (25  mL)  was  added 
and  heated  at  74  °C  for  6  days.  The  mixture  was  concentrated 
under  reduced  pressure  and  purified  by  chromatography;  790 
mg  white  solid  (61%,  Rr-  0.23,  2:3  hexanes-  ethyl  acetate); 
rnp  =  228-230  "C).  ’H  NMR:  7.26-7.21  (q,  J  =  8.5  Hz,  2H), 
6.64-6.61  (q,  .7=  8.5  Hz,  211),  5.73  (s,  1H),  3.77  (s,‘  2H),  3.24 
(s,  III),  2.14  (s,  3H),  2.63-1.10  (complex,  1  111).  L19  (s,  3H), 
0.69  (s,  3H).  IR:  3420,  3360,  3250,  2930,  1700.  13C  NMR:  6 
209.3, 199.0,  167.6,  147.1,  136.6,  127.3,  121.2,  115.7,63.4,  17.7, 

13.1. 

General  Procedure  for  the  Preparation  of  Progester¬ 
one  Analogues.  A  suspension  of  2  in  degassed  chloroform  was 
treated  with  the  appropriate  isocyanates  under  N^.  The 
mixture  was  stirred  for  12  h  and  then  chromatographed 
directly  on  silica  gel  to  afford  the  corresponding  ureas  as  oil. 
The  resulting  oil  was  stirred  in  anhydrous  ether  until  white 
powder  came  out . 

7a-(4'-(/V-Chloroethylaminoacylaminophenyl)thio]- 

pregna-4-ene-3,20-dione  (3).  Reaction  of  2  (0.1 0  g,  0.23 
mmol)  with  2-chJoroethylisocyanate  (38  ;^L,  0.46  mmol)  in 
CHC'L  (3.0  mL)  for  12  h  gave  50  mg  of  product  (40%,  rnp  = 

137 . 141  “G,  Rr  —  0.47,  2:3  hexanes— ethyl  acetate).  MM  NMR: 

d  7.34-7.25  (m,  4H),  5.69  (s*.  1H),  5.18  (s,  1H),  3.68-3.62  (m, 
4H),  3.38  (s,  1H),  2.64-0.84  (complex,  18H),  2.14  (s,  3H),  1.19 
(s,  3H),  0.69  (s,  3H).  IR:  3312,  2964,  1700,  1630,  1587,  1517, 
1488,  1449,  1394,  1238,  1013,  831,  734.  ,3C  NMR:  6  231.5, 
210.3,  196.2,  193.9,  181.9.  156.5,  149.3,  146.4,  141.4,  132.9, 

125.1,  1 19.5.  1 18.5,  103.2,  94.2,  75.9,  75.8,  7L9,  69.3,  49.0,  35.8, 

24.2,  14.4.  MS:  win  =  543  (24,  M+  4-  1),  507  (10),  313  (27), 
230  (23),  185  (50),  149  (69),  125  (57),  1  19  (23),  107  (38),  105 
(48),  91  (50),  81  (50),  57  (73),  55  (100).  HR  MS:  ealed  for 
C30I  laoNjOaSCl  |M  +  H]\  543.24481 ;  found,  543.24248.  Anal. 
Calcd  for  (CanHwOaNpSCl):  C,  66.22;  H,  7.41 ;  N,  8.82;  S,  6.52. 
Found:  C,  66.38;  H,  7.27;  N,  8.78;  S,  6.28. 

7a-[4'-(N-Ethy)aminoacylaininoplienyl)thio]pregna-4- 
ene-3,20-dione  (4).  Reaction  of  2  (0.10  g,  0.23  mmol)  with 
ethyl  isocyanate  (37  ph,  0.46  mmol)  in  CHCJ3  (3.0  mL)  for  12 

h  gave  78  mg  of  product  (67%,  mp  =  130 . 135  “C,  Rf  -•  0.36, 

2:3  hexanes-ethvl  acetate).  Ml  NMR:  d  7.36 . 7.25  (m,  4H), 

6.38  (s.  Ill),  5.69  (s.  1H),  4.18-4.03  (m,  2H).  3.38-3.26  (m, 
211),  2.67-  0.68  (complex,  1711),  2.14  (s,  3H),  2.05  (s,  2H),  1.20 
(s.  311),  0.69  (s,  3H).  IR:  3855,  3745,  3678,  3373,  2953,  2359, 
1700,  1663,  3539,  1457,  1223.  13C  NMR:  6  228.5,  222.5,  193.9, 
171.5,  141.5,  135.0,  128.3,  127.0,  123.4, 1 18.5,  108.7,96.2,84.5, 

69.3,  67.7,  66.0,  62.6,  52.2,  48.4,  46.3,  43.9,  39.8,  34.1,  22.9, 
21 .2,  13.4.  MS:  rn!e=  509  (62,  M+  +  1),  438  (8),  313  (32),  196 
(47),  125  (100),  117  (57),  97  (52),  95  (85),  79  (68),  71  (59). 
11  RMS:  calcd  for  C30H40N2O3S  [M  +  H|\  509.28378;  found, 
509.28372.  Anal.  Calcd  for  (CaoH^OsN^S):  C,  70.69;  H,  8.11; 
N.  5.49;  S,  6.29.  Found:  C,  70.46;  H,  8.06;  N,  5.52;  S,  6.20. 
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7a-[4/-(Ar-a(H-)'Methylbenzylannnoacylaininophenyl)- 
thio]pregna-4-ene-3,20-dione  (5).  Reaction  of  2  (0.1  Og,  0.23 
mmol)  with  {A7} -(+)- a- methylbenzyl isocyanate  (GO  fiL,  0.4G 
mmol)  in  CHCJa  (3.0  ml)  for  1 2  h  gave  56  mg  of  product  (46%, 
mp  =  1 46-149  °C,  1 ?/=  0.46,  2:3  hexanes  ethyl  acetate). 
NMR:  6  7.32-7.25  (rn,  5H)f  5.79-5.77  (rn,  1H)',  5.70  5.68  (s, 
1H),  4. 97-4.92  (m,  1H),  4.13-4.06  (m,  1H),  3.28  (s.  1H),  2.64- 
1.49  (complex.  7 H),  2.14  (s.  3H),  1.45  (d.  J  9.3  Hz,  3H),  1.19 
(s.  3H),  0.68  (s,  3H) .  IR:  3353,  3273,  2949,  2854,  2362,  2340, 
1700,  1653,  1595,  1539,  1457,  1460,  1376,  1343,  1159,  1089, 
916. 13C  NMR:  (>209.4,  199.0,  167.6,  147.0,  136.6,  127.2,  121.2, 

115.8,  63.4,  17.7,  13.1.  MS:  mle =  585  (11,  M  '  +  1),  135  (12), 
125  (20),  105  (100),  103  (22),  91  (29),  77  (22),  55  (26).  HRMS: 
ealed  for  CwVhWhS  |M  +  H]+,  585.31 506;  found,  585.31501 . 
Anal.  Calcd  for  (C36H4S03N2S);  C,  73.81;  M,  7.74;  N,  4.78;  S, 
5.47.  Found:  C,  73.76;  H,  7.79;  N,  4.81;  S,  5.39. 

7a-[4^(7V-p-Toluenesulfonylarmnoacylaininophenyl)- 
thio]pregna-4-ene-3,20-dione  (6).  Reaction  of  2  (0.1 0  g,  0.23 
mmol)  with  p-toluenesulfonylisocyanate  (59  uL,  0.46  mmol)  in 
CHCI3  (3.0  mL)  for  12  h  gave  120  mg  of  product  (83%,  mp  = 
128-132  °C,  Rf  -  0.29,  2:3  hexanes-ethyl  acetate).  'M  NMR: 
(5  8.38  (s,  1H),  7.88  (d,  J  =  8.4  Hz,  2H),  7.80  (d,  J  =  8.3  Hz, 
2H),  7.37-7.25  (tn,  4 Id),  5.70  (s,  1H),  3.36  (s,  1H),  2.67-1.13 
(complex,  2 OH),  2.41  (s,  3H),  2.15  (s,  3H),  1.55  (m,  2H),  1.20 
(s,  3H),  0.69  (s,  3H).  IR:  3855,  3745,  2359,  1700,  1539,  1457, 
1160,  1086,  668.  l3C  NMR:  d  198.6,  148.6,  141.4,  136.6,  134.6, 

129.9,  129.7,  129.6,  127.7, 127.2, 126.4, 120.5,  1 18.5,  92.4,  76.1 , 
69.3,  63.3,  52.1,  51.1,  47.0,  46.3,  39.8,  39.4,  38.5,  38.1,  35.4, 
34.0,  31.6,  23.7,  22.9,  21.8,  21.1,  17.9,  13.4.  MS:  mle  =  635 
(29,  M+  +  1),  313  (39),  155  (33),  135  (36),  125  (65),  119  (64), 
91  (100),  85  (92),  77  (47),  59  (50),  47  (45).  HRMS:  calcd  for 
C35H42N2O5S2  |M  +  H]\  635.26135;  found,  635.26130.  Anal. 
Calcd  for  (CwHwOsNzSz):  C,  66.11;  H,  6.82;  N,  4.4);  S,  10.09. 
Found:  C,  66.05;  H,  6.79;  N,  4.45;  S,  10.02. 

Pharmacology.  Cell  Lines.  For  the  studies  of  antiPgp 
activity,  we  used  cells  transduced  with  a  retroviral  vector 
directing  the  constitutive  expression  of  the  Pgp  gene  (MDA435/ 
LCC6MDR1)  and  their  parental,  Pgp-negative,  MDA435/LCC6 
human  breast,  cancer  cells.17  Both  MDA435/LCC6  and  MDA435/ 
LCC6MDR1  cells  are  estrogen  receptor  and  PGR  negative  and 
grow  as  monolayer  cultures  in  vitro  and  as  rapidly  proliferat¬ 
ing  solid  tumors  and  malignant  ascites  in  vivo  in  nude  mice.17 
We  used  MCF-7  human  breast  cancer  cells57  to  measure 
binding  to  PGR.  These  cells  were  routinely  grown  in  vitro  in 
Improved  Minimal  Essential  Media  (Bibfiuidsj" containing  5% 
fetal  bovine  serum  in  a  5%  CO?: 95%  air  atmosphere’. 

Substrate  Accumulation  Assays.  Pgp  reversing  activity 
of  all  test  agents  was  evaluated  by  measuring  the  ability  of 
the  agents  to  affect  accumulation  of  DOX  and  VBL  in  M DA 4 35/ 
LCC6MDR1  (resistant)  and  MDA435/LCC6  (control)  cells.  Cells 
were  plated  at.  2.5  x  10s  cells/well  into  24  well  culture  dishes, 
in  routine  growth  media,  and  incubated  at  37  MC.  Forty-eight 
hours  after  they  were  plated,  cells  were  treated  by  replacing 
growth  media  with  media  containing  the  test  compounds  at 
five  different  concentrations  and  either  DOX  (4  //M)  or  [3H] 
VBL  (5  nM).  AH  treatments  were  carried  out  in  triplicate.  After 
3  h  of  incubation,  treatments  were  stopped  by  washing  each 
well  with  0.5  mL  of  ice-cold  NaCl  (0.15  M).  Cells  from  reference 
wells  in  each  plate  were  counted  to  enable  accumulation  to  be 
corrected  for  cell  number. 

For  the  DOX  accumulation  assays,  DOX  was  extracted  from 
cell  monolayers  by  adding  1 .5  mL  of  20%  trichloroacetic  acid 
to  each  well  and  incubating  overnight  at  4  UC  in  the  dark.  DOX 
concentrations  in  the  extracts  were  evaluated  fluorotnetrically. 
Thus,  extracts  were  transferred  into  13mm  x  100  mm  boro- 
silicate  glass  tubes,  placed  in  the  10  x  10  rack  of  a  Hitachi 
A3000  Autosampler  and  connected  to  a  Hitachi  F-4500  fluo¬ 
rescence  spectrophotometer.  Fluorescence  was  read  at  500  nm 
excitat  ion  and  580  nm  emission  wavelengths.  Concentrations 
of  DOX  were  obtained  by  interpolation  on  a  DOX  standard 
curve  and  normalized  on  the  extraction  volume  and  number 
of  cells  per  well.  For  the  VBL  accumulation  assays,  at  the  end 
of  treatment,  wells  were  rinsed  with  phosphate- buffered  saline 
(0.5  mL/well)  and  left  to  dry  at  room  temperature.  Cell 
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monolayers  were  removed  by  t  rypsinization  and  diluted  with 
10  mL  of  scintillation  fluid  (Ultima  Cold  XR,  Packard  Bio¬ 
science,  Meriden,  CT).  Drug  accumulation  was  radiometrically 
assessed  by  scintillation  spectrometry7 . 

Results  of  substrate  accumulation  assays  were  plotted  as 
cellular  concentration  of  substrate  against  the  concentration 
of  the  respective  test,  compound.  Pgp  reversing  potency  was 
expressed  as  the  EC50,  defined  as  the  concentration  of  a  test 
drug  that  reduced  the  difference  in  substrate  accumulation 
between  Pgp-negative  and  Pgp-positive  cells  by  50%.  Proges¬ 
terone  and  the  standard  Pgp  reversing  agents  verapamil  and 
cyclosporin  A  were  used  as  positive  controls  and  as  a  reference 
to  establish  relative  potency. 

Cytotoxicity.  Twenty-four  hours  after  they  wore  plated  in 
96  well  plai.es/MDA435/LCC6  and  MDA435/LCC6MDR1  cells 
were  exposed  to  growth  media  containing  different  concentra¬ 
tions  of  the  test  agents  for  5  days.  Cell  cultures  were  then 
fixed/stained  by  incubation  in  a  0.5%  (w/v)  crystal  violet 
solution  in  25%  methanol  (v/v).  After  plates  had  dried,  the  dye 
was  extracted  in  0.1  M  sodium  citrate  in  25%  methanol  (v/v) 
and  absorbance  was  read  at.  540  nm  using  a  microplate 
spectrophotometer.  Absorbance  directly  correlates  with  cell 
number  in  this  assay.60  Cell  survival  curves  were  obtained  by 
plotting  absorbance  values  (as  percent  of  untreated  controls) 
against  drug  concentration.  The  toxicity  of  each  drug  was 
expressed  as  an  IC50.  defined  as  the  concentration  inhibiting 
cell  density  by  50%  at:  the  end  of  the  treatment  period.  To 
estimate  the  extent,  of  resistance  conferred  by  Pgp,  the  ratio 
of  each  drug’s  1CM  in  MDA435/LCC6MDR1  and  MDA435/LCC6 
cells  (relative  resistance  of  Pgp-positive  cells)  was  used  for 
those  drugs  that  produced  a  detectable  JC50  value. 

Radioligand  Binding  Studies.  These  were  performed  as 
previously  described,  irsing  a  whole  cell  competitive  binding 
assay.57,5®  Briefly,  MCF-7  cells  were  grown  in  24  well  dishes 
and  incubated  at  37  °C  with  100  nM  hydrocortisone  for  30  min, 
before  determining  PGR  binding,  to  eliminate  residual  binding 
to  glucocorticoid  receptors.  Subsequently,  cells  were  incubated 
for  60  min  at  37  °C  with  5  nM  |3H]  ORG2058  (specific  activity 
50.6  Ci/mmo.1)  in  the  absence  or  presence  of  increasing 
concentrations  of  unlabeled  competitor  (0.5  nM-1  //M;  proges¬ 
terone,  ORG2058,  5).  Radioactivity  was  extracted  into  ethanol 
and  measured  in  a  liquid  scintillation  spectrometer. 

Data  Analysis.  DOX  accumulation  and  cytotoxicity  dose 
response  data  were  processed  and  graphed  using  SigmaPlot. 
4.0  (SPSS  Science,  Chicago,  IL).  ECso  (DOX  accumulation 
assays)  and  JC50  values  (cytotoxicity  assays)  were  calculated 
by  interpolation  on  the  respective  dose  response  curves.  The 
EC*,o  and  JC50  values  reported  in  Tables  2  and  3  represent:  the 
mean  and  standard  error  (SE)  obtained  from  at:  least  three 
independent  experiments.  Descriptive  statistics  were  obtained 
using  SigmaStat  2.0  (SPSS  Science). 
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